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A Path-Goal Approach to Productivity ' 


Basil S. Georgopoulos, Gerald M. Mahoney,’ and Nyle W. Jones, Jr. 


Survey Research Center, University of Michigan 


The problem of the adequate uncovering 
and proper evaluation of social-psychological 
factors in the area of organizational effective- 
ness constitutes a major aspect of current 
thinking and research. Since it entails con- 
siderable theoretical interest as well as ac- 
tion implications, it has received constant at- 
tention in the programmatic planning of the 
Organizational Behavior and Human Rela- 
tions Program, Survey Research Center. In 
general, the program has focused upon the 
variable of productivity as one type of or- 
ganizational effectiveness. According to this 
position, the performance of people in or- 
ganizations may be considered as reflecting 
the relative attainment of important organi- 
zational objectives, and its prediction should 
contribute to our understanding of human be- 
havior. In the present article, we shall be 
concerned with the prediction of individual 
productivity in industrial settings, approach- 
ing an old problem in a new way. 


Problem 


The question is why some workers tend to 
be high producers, or why persons of largely 
similar background who are engaged in the 
same activity under comparable conditions 
exhibit considerable variability in output. 
Specifically, what determines high produc- 
tivity? In attempting to provide an answer 
to this problem, previous studies in the pro- 


' The data of this article are from a study con- 
ducted by the Organizational Behavior and Human 
Relations Program of the Survey Research Center 
The authors wish to thank Robert L. Kahn, Pro- 
gram Director, and Seymour Lieberman for their 
help and suggestions in this research. They would 
also like to thank L. Richard Hoffman, Leo Meltzer, 
and Arnold S. Tannenbaum for their helpful com- 
ments concerning this article 

2 Not now connected with the University of Michi 
gan. 


gram have explored the relationship of sev- 
eral factors to productivity, employing various 
approaches. These included “morale,” cer- 
tain job satisfactions, supervisory practices, 
and group cohesiveness (1, 2, 3,4, 7,8). The 
results, inconsistent and inconclusive in many 
cases, pointed to the complexity of the prob- 
lem, suggesting a number of hypotheses. It 
clearly emerged that productivity is the re- 
sultant of a complex of factors, both indi- 
vidual and situational, both phenomenal and 
objective. On either side, both rational and 
nonrational factors appeared to be involved, 
some being forces toward and others against 
high productivity. 


The Path-goal Hypothesis 


Beginning with the notions that individuals 
in the work situation have certain goals in 
common, the achievement of which would 
satisfy certain corresponding needs, and that 
behavior is in part a function of rational 
calculability, or decision making in terms of 
goal-directedness, we arrived at a path-goal 
approach. ‘This approach is based on the fol- 
lowing assumptions: individual productivity 
is, among other things, a function of one’s mo- 
tivation to produce at a given level; in turn, 
such motivation depends upon (a) the par- 
ticular needs of the individual as reflected in 
the goals toward which he is moving, and 
(b) his perception regarding the relative use- 
fulness of productivity behavior as an instru- 
mentality, or as a path to the attainment of 
these goals. 

‘People have certain needs in common so 
that, brought into a commun situation, they 
will seek and pursue among available geals 
those which promise to satisfy their needs. 
The path to be followed in a given case will 
be a function of the expectations of the in- 
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dividual. If high productivity is perceived as 
the appropriate path, the individual should 
become a high producer. This formulation 
may be simply stated in the form of the fol- 
lowing general hypothesis, labelled as the 
“path-goal hypothesis”: // a worker sees high 
productivity as a path leading to the attain- 
ment of one or more of his personal goals, he 
will tend to be a high producer. Conversely, 
if he sees low productivity as a path to the 
achievement of his goals, he will tend to be a 
low producer. ‘The first aspect of this hy- 
pothesis is probably more general and rele- 
vant in our society; it is doubtful whether 
many people see low productivity as helping 
the achievement of many of their goals. 

According to this hypothesis, if, for exam- 
ple, a worker has a need to be liked by his co- 
workers and he sees high (or low) produc- 
tivity as a path to the attainment of his goal 
to get along well with the work group, we will 
predict that he is likely to follow this path 
and, in effect, become a high (or low) pro- 
ducer. Such likelihood, however, depends on 
at least two important conditions: the path 
will be presumably chosen if his need(s) is 
sufficiently high, or if his goal(s) is relatively 
salient, and if no other more effective and 
economical paths are available to him. And 
while the latter condition ordinarily holds in 
the work situation, the former must be de- 
termined. 

Furthermore, even if the high productivity 
path is chosen, i.e., the worker is motivated 
to produce at a high level, we cannot be sure 
that he will in fact become a high producer. 
This would be the case if there were no re- 
straining forces, if no barriers blocked the de- 
sired path. Such factors, acting as limiting 
conditions, may hinder the translation of mo- 
tivation to produce into actual productive be- 
havior of a given level. ‘This, therefore, re- 
quires that the person be relatively free to 
follow the desired path. In view of these 
qualifications, the relationship between path- 
goal perception and productivity should be 
more pronounced among workers who have a 
high need for a given goal and who encounter 
no barriers. This suggests that the path-goal 
hypothesis should hold better under the con- 
dition of high need and under the condition 
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of freedom from restraining factors, and that 
it should hold best under high need and free- 
dom combined. 

The major independent variable studied is 
the worker’s perception of the usefulness or 
of the instrumentality of productivity as a 
path leading to job related goals. Such path- 
goal perceptions may be conceived as expec- 
tations, or psychological probabilities, of vary- 
ing amounts of environmental return of a 
given kind, as a consequence of certain be- 
havior. A path-goal perception pattern would 
seem to reduce to the question, “How much 
payoff is there for me toward attaining a per- 
sonal goal while expending so much effort to- 
ward the achievement of an assigned organi- 
zational objective?” We, of course, recognize 
that several factors affect the emergence, pat- 
terning, and change of path-goal perceptions. 
In this study, however, these perceptions, as 
reported by the workers, are considered as 
“given,” and reference is made to only one 
kind of behavior—productivity. In addition 
to the path-goal perception variant, the role 
of two other pertinent conditions is taken into 
account. These are the level of need of the 
person, or the relative significance of each of 
his particular goals, and his level of freedom 
from constraining factors. 

In summary, according to the present ap- 
proach, behavior is viewed as function of 
needs, expectations, and situations. Produc- 
tivity level is seen as representing purposive 
behavior which is determined through the in- 
teraction of both facilitating and inhibiting 
forces, forces in the individual and in the en- 
vironment. More specifically, it is seen as a 
function of path-goal perception, level of need, 
and level of freedom. ‘These three factors 
should in part determine one’s productivity 
behavior. Theoretically, additional social- 
psychological variables, such as group norms, 
should account for the rest of the variance 
in individual performance. 

This orientation is similar to the theoretical 
positions of Lewin and Tolman. Lewin con- 
tended that behavior (locomotion) takes place 
over paths, some of which are more direct 
than others in relation to a given goal (5). 
He also distinguished between driving forces 
toward a goal and barriers, which impede 
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locomotion, and asserted that a behavioral 
possibility depends on the nature of the ex- 
isting forces and the valence, positive or 
negative, of the particular goal (6). Simi- 
larly, according to Tolman’s model, the pos- 
sibility of a behavior is a function of the 
needs and of the “means-end readiness’ of 
the individual in the given stimulus situation 


(9). 


The Sample of the Study 


The path-goal hypothesis was submitted to 
a first test in connection with a more encom- 
passing research project in a medium-size, un- 
ionized, household appliances company. ‘The 
relevant data, in the form of a questionnaire, 
were collected from the company’s entire in 
dividual incentive worker population. How- 
ever, from the 722 workers, 62 were excluded 
for low questionnaire comprehension (having 
also been excluded from the larger project) ; 
another 39 workers of nonascertained produc- 
tivity level were also excluded, thus leaving a 
sample of 621 cases in the two plants oper- 
ated by this organization. 

Nearly all of the workers (92%) were un- 
ion members, and most were male (78%) and 
married (77%), having one or two depend- 
ents (72%). Most grew up on farms or in 
small towns (76%). The average age of the 
group was 35, with 30% being in their 20’s 
and 46% being between 30 and 39. In terms 
of education, 35% had less than high school 
education, another 31% had some high school, 
and 32% had completed high school. Nearly 
half ef these workers (43%) had less than 
one year of experience on the job. Finally, a 
fact of relevance is that the majority of these 
people (58%) felt that on-the-job satisfac- 
tions are as important as those gained off the 
job. 

The incentive plan under which these peo- 
ple operated is of the “standard hour” type. 
Workers are periodically evaluated for se- 
lected time intervals, and an engineer esti- 
mates the relative effort of the person. With 
1.00 assumed as normal effort, a fast worker 
may be rated 1.30, or 30% over normal, etc. 
Finally, a series of allowances for such things 
as unavoidable delays, fatigue, personal time, 
etc., and a 5% constant allowance are added 


347 


to determine the standard time for a unit of 
production. At the end, it may turn out that 
a worker producing exactly at his normal 
work standard is paid at the rate of 118% 
of the hourly base rate for that job classifi- 
cation, his productivity being 1180. A per- 
son may raise his performance, and his wages, 
by increasing his effort or work pace, 


Method and Procedure 


As the operational definition and relevant measure 
of the path-goal perception concept, responses to 
two groups of questionnaire items were utilized 
One group was designed to ascertain how instru- 
mental high productivity is seen for attaining cer- 
tain job related goals, the other to ascertain how 
instrumental low productivity is seen for achieving 
these same goals. For each of a number of goa! 
items, this was done by having the worker evaluate 
high productivity on a five-point scale, from “help- 
ing” to “hurting” the attainment of a given goal; 
the same evaluation was separately made for low 
productivity. We, therefore, have the instrumen- 
tality of high productivity and the instrumentality 
of low productivity, respectively representing the 
two evaluations. 

In the case of the instrumentality of high produc- 
tivity, a worker who evaluates high productivity as 
helping the attainment of a given goal is said to 
have a positive path-goal perception, one who sees 
it as hurting to have a negative perception, and one 
who perceives neither to be the case to have a neu- 
tral (irrelevant perception).* For the instrumental- 
ity of low productivity. the terminology is reversed 
One who sees low productivity as helping is said to 
have a negative perception from the point of view of 
his productivity behavior, and one who sees it as 
hurting to have a positive perception; a neutral al 
ternative is also used. According to the theory, high 
producers would be those who have positive path 
goal perceptions (Sigh productivity helps or low pro- 
ductivity hurts). Those who express a neutral per- 
ception, when possible (where the analysis cells are 
large enough to work with), are eliminated from the 
analysis. Where this is not possible, the neutral per 
ception group is combined with the negative to yield 
more cases for study. 

Among the specific goal items studied in connec- 
tion with the two productivity instrumentalities are, 
“making more money in the long run,” “getting 
along well with the work group,” and “promotion 
to a higher base rate.” The first item is unlike the 
other two in that it did not present the respondent 


8 Thus, the term “instrumentality” is used to refer 
to the level of productivity, high or low, which is 


the object of a path-goal perception; the terms 
“positive,” “negative,” and “neutral” are used to 
refer to the nature of the valence (value or sign) of 
this perception, from the point of view of produc 
tivity behavior. 
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with a neutral perception alternative. Of a series of 
ten job related items, these three were selected for 
full study because of their relatively higher impor- 
tance to the sample as a whole, as determined by 
rank-ordering on the part of the respondents. With 
10.00 being the lowest possible mean-rank that could 
result from the combined judgments of all individu- 
als in the sample and 1.00 being the highest, the 
mean-rank relevance of these items was 5.1, 4.5, and 
$.1, respectively. 

For each goal item, the level of need of each indi- 
vidual was determined inferentially. Those ranking 
a particular goal as 1, 2, or 3 on a scale of impor- 
tance from 1 to 10 (the number of items ranked) 
were pooled to form the high-need group; the re- 
mainder constitute the low-need group with respect 
to the same goal. The level of freedom for each in- 
dividual was ascertained by combining three job 
relevant factors, each of which was expected from 
past experience to be related to productivity and 
was in fact found to bear such a significant positive 
relationship. at the .01 level. Specifically, those who 
stated that they were free to set their work pace, 
who had a minimum of six months’ experience on 
the job, and who were aged between 20 and 59 years 
inclusive constitute the free group; the remainder, 
lacking in one or more of these characteristics, form 
the not-free group. The rationale for this combina- 
tion is that absence of any of the three characteristics 
may act as a barrier with respect to the freedom of 
a person to vary his productivity so that he can 
produge at a desired high level. The reader should 
note that while the free-not free groups are the same 
for all three goal items studied, the high-low need 
groups vary from item to item. 

Finally, the measure used to ascertain productivity 
level is based on the question: “What productivity 
percentage figure do you usually hit in a day? (write 
in the % below) It was decided that 
this reported, rather than a seemingly more “objec- 
tive,” measure be used. This was due to the ano- 
nymity of the questionnaires, the difficulty in choos- 
ing a suitable time baseline for the computation of 
a person’s average productivity, and imperfections in 
company records. Although the precise validity of 
this measure cannot be determined from the data 
themselves, past experience with similar cases gives 
us no reason to question its validity for the sample 
as a whole 

The division of the sample into high and low pro- 
ducers was made on the basis of the obtained pro- 
ductivity distribution. The median productivity for 
the sample was 140%, the range being from 50% to 
200% but with very few cases falling at either ex- 
treme. However, a disproportionately large number 
of workers, 24% of the sample, reported a produc- 
tivity of 140%, ie, at the median point. This 
argued in favor of dichotomizing the distribution at 
the point between’ 140% and 141%, ic., just above 
the median, in order that we could be sure that 
“high” producers are sufficiently differentiated from 


“low” producers. Thus, the dichotomy resulted in 
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the creation of a high productivity group (produc- 
tivity of 141% or more) and a low productivity 
group (140% or less). This dichotomy gave data 
of relative symmetry and comparability, thus not 
suggesting separate treatment for the two company 
plants from which the sample derives. Similar per- 
centages of high as against low producers charac- 
terized each of the two plants as well as the total 
sample. We found that 27% of the 390 workers in 
the one plant and 29% of the 231 workers in the 
other were categorized as high producers, while 28% 
of the total sample of 621 individuals were so classi- 
fied as a result of the above procedure. 


Results 


The first operational prediction investigated 
constitutes a restatement of the path-goal hy- 
pothesis: 

Hypothesis 1. With respect to a given goal 
item, the percentage of high producers will be 
greater among workers having a positive path- 
goal perception (high productivity helps or 
low productivity hurts) than among those 
having a negative perception (high produc- 
tivity hurts or low productivity helps). 

Table 1 shows the relationship between 
path-goal perception and productivity in terms 
of percentages of high producers with refer- 
ence to each of three goal items, separately for 
the instrumentality of high and the instru- 
mentality of low productivity. These data 
support the hypothesis that high productivity 
will more often occur among those who have 
a positive path-goal perception. For exam- 
ple, of those who perceive high productivity 
positively with respect to making more money 
in the long run, 38% are high producers, in 
contrast to only 21% of those who see it 
negatively; similarly, of those who perceive 
low productivity positively (as hurting), 30% 
are high producers as against 22% of those 
having a negative path-goal perception. Simi- 
lar results obtain for the goal items of get- 
ting along well with the work group and pro- 
motion to a higher base rate. All six per- 
centage comparisons between positive and 
negative perceivers are as predicted by Hy- 
pothesis 1. Three of these differences, more- 
over, are statistically significant, by chi-square 
test corrected for continuity, at the .05 level 
or better. The data also show that the hy- 
pothesis holds equally as well for the instru- 
mentality of low productivity as for the in- 
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Table 1 
The Relationship Between Path-Goal Perception and Productivity 


(Percentage of High Producers*) 


Path-Goal Perception 


Instrumentality of High Instrumentality of Low 
Productivity Productivity 


Positive Negative Positive Negative 
(high pro (high pro (low pro (low pro 
Goa] Item Involved in ductivity ductivity ductivity ductivity 
Path-Goal Perception “helps’’) “hurts’’) “hurts’’) “helps’’) 


More money in the long run 38% (234) 21% (376)* 30% (380) % (215)* 


» 
Getting along well with work group 32% (66) 23% (195) 33% (189) > 28% (36) 
12% (43)* 


Promotion to a higher base rate 26% (236) > 23% (31) 32% (298) 


* These percentage differences are significant, by chi-square test, at the .0S level or better and in the predicted direction 
* Percentages are based on the number in the corresponding parentheses; the complement of each percentage, not appearing 
in the table, represents the percentage of low producers in each case 


strumentality of high productivity, since the ducers between those having a positive and 
pattern of results is about the same in both those having a negative (and/or neutral) * 
cases. path-goal perception will be greater among 
The substantiation of the path-goal hy- workers who have a high than among those 
pothesis also depends on the demonstration who have a low need for the same goal. 
that it holds better among workers who have ‘For Tables 2, 3, and 4, from the data of which 
a high rather than a low need for a given goal Hypotheses 2, 3, and 4 are tested, as was earlier 
. 2 mentioned, the neutral is combined with the negative 
item: . ' 
perception category to yield enough cases for study 
Hypothesis 2. With respect to a given goal This does not apply to the goal item “making more 


. oe liff high money in the long run,” however, since the respond- 
item, the percentage diflerence of hig pro- ents had not been offered a neutral choice in this case. 


Table 2 
The Relationship Between Path-Goal Perception and Productivity: When Controlling for Level of Need 
(Percentage of High Producers*) 


Path-Goal Perception 
High Need Low Need 


Negative Negative 
and/or and/or 
Positive Neutral Positive Neutral 
Goal Item (1) (2) (3) (4) 


Instrumentality of High Productivity 


More money in the long run 38% (86)* 14% (119) 36% (129)* 25% (229) 
Getting along well with work group 40% (25) 28% (181) 28% (36) 27% (326) 
Promotion to a higher base rate 28% (87) 23% (117) 26% (137) 32% (213) 


Instrumentality of Low Productivity 
More money in the long run 26% (136) 17% (66) 32%, (213) 
Getting along well with work group 33% (66) 26% (135) 33% (109) 
Promotion to a higher base rate 28% (93) 19% (107) 33% (180 


* These percentage differences are significant, by ch juare test, at the .0O5 level and in the 
* Percentages are based on the number in the corresponding parentheses; the complement 


in the table, represents the percentage of low producers in each case, 


predicted direct 
A each percentage 
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Table 2 presents the data which test Hy- 
pothesis 2. Comparing for each goal item the 
percentage difference between Columns 1 and 
2 (high need), with the corresponding dif- 
ference between Columns 3 and 4 (low need), 
we should expect in each case the former dif- 
ference to be higher than the latter, accord- 
ing to the hypothesis. As an example, if we 
consider the goal of making more money in 
the long run in the case of the instrumen- 
tality of high productivity, we find that un- 
der the condition of high need the difference 
in high producers between positive and nega- 
tive path-goal perceivers is 24%, compared 
to 11% under the condition of low need. In 
all, six such comparisons are possible, and 
five of these yield differences as predicted by 
the theory. Therefore Hypothesis 2 receives 
considerable support from the data. The data 
also show that the original path-goal hypothe- 
sis holds also when we control for level of 
need; regardless of level of need, high produc- 
tivity occurs more often among positive rather 
than negative perceivers. This is shown from 
the fact that 11 of the 12 possible differences 
between positive and negative perceivers are 
as predicted by Hypothesis 1. 

The third hypothesis is introducing the 
condition of freedom: 
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Hypothesis 3. With respect to a given 
goal item, the percentage difference of high 
producers between those having a positive 
and those having a negative (and/or neu- 
tral) path-goal perception will be greater 
among workers who are free than among 
workers who are not free from constraining 
forces. That is, the path-goal hypothesis will 
hold better under the condition of freedom. 

Table 3 presents the data which test Hy- 
pothesis 3. A comparison identical with the 
one made with the data of Table 2 in con- 
nection with Hypothesis 2 (comparing the 
percentage differences between Columns 1 and 
2 with the corresponding differences between 
Columns 3 and 4) shows that five of the six 
possible percentage differences between posi- 
tive and negative perceivers who are free, on 
the one hand, and positive and negative per- 
ceivers who are not free, on the other, are as 
predicted by Hypothesis 3. Under the con- 
dition of freedom, the difference in high pro- 
ducers between those who have a positive and 
those who have a negative (and/or neutral) 
path-goal perception is greater than under the 
condition of no freedom. Furthermore, as in 


- the case of level of need, the data show that 


the original path-goal hypothesis also holds 
when we control for level of freedom, since, 


Table 3 
The Relationship Between Path-Goal Perception and Productivity: When Controlling for Level of Freedom 


(Percentage of High Producers*) 


Positive 
Goal Item (1) 


More money in the long run 52% (103)* 
Getting along well with work group 38% (29) 


Promotion to a higher base rate 43% (88) 


More money in the long run 44% (158) 
52% (81)* 


52% (114)* 


Getting along well with work group 
Promotion to a higher base rate 


Path-Goal Perception 
Not Free 


Negative 

and/or and/or 

Neutral Positive Neutral 
(2) (3) (4) 


Negative 


Instrumentality of High Productivity 
35% (127) 27% (131)* 
43% (201) 21% (37) 
43% (140) 16% (148) 


14% (249) 
18% (344) 
21% (228) 


Instrumentality of Low Productivity 


38% (65) 20% (222) 
37% (138) 19% (108) 
33% (108) 20% (184) 


15% (150) 
17% (260) 
16% (184) 


* These percentage differences are significant, by chi-square test, at the .0S level and in the predicted direction. 


* Percentages are based on the number in the corresponding parentheses; the complement of each percentage, not appearing 
in the table, represents the percentage of low producers in each case. 
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Table 4 


The Relationship Between Path-Goal Perception and Productivity: When Controlling for 
Level of Need and Freedom 


(Percentage of High Producers*) 


High Need 


Free Not Free 
Negative 

and/or 

Positive Neutral 
(1) Q 


and 
Positive Neu 
Goal Item 


Negative 


(3) “4 


Path-Goal Perception 
Low Need 


Free Not Free 
Negative 
and/or 
Neutral 
(6) 


Negative 
and/or 
Neutral 
(8) 


or 
tral Positive 


(5) 


Instrumentality of High Productivity 


More money in 
the long run 


66% (38)* 22% (37) 40% (57) 39% 


Getting along well 
with work group 


40% (10) 39% (66 44% (16) 


427% 
2A 


Promotion toa 
higher base rate 


46% (35) 44% (39) (48) 39% 


43% (126) 


(79) 17% (48) 11% (82) » (146) 


40% (15) 21% (115) (200 


(87) 15% (52) 13% (78) % (126) 


Instrumentality of Low Productivity 


More money in 
the long run 


ao" 26% (19) 40% (91) 41% 


Getting along well 
with work group 


50% (28 32% (47) 52% (48) 38% 


Promotion toa (33)* 


higher base rate 


56% 31% (39) 47% (70 33% 


* These percentage differences are significant, by chi 


regardless of level of freedom, high produc- 
tivity more often occurs among positive rather 
than negative perceivers. If the twelve pos- 
sible comparisons between positive and nega- 
tive path-goal perception are made, it will be 
found that nine of these are as predicted by 
Hypothesis 1, there is no difference in one 
case, and only two comparisons result in dif- 
ferences of an opposite direction. 

To substantiate the path-goal orientation, 
the data should finally demonstrate that the 
path-goal hypothesis holds best among work- 
ers who have a high need for a given goal and 
who, at the same time, are free from barriers: 

Hypothesis 4. With respect to a given goal 
item, the percentage difference of high pro- 
ducers between those having a positive and 
those having a negative (and/or neutral) 
path-goal perception will be greater among 
workers who have a high need and are free 
than among workers characterized by any 
other combination of need and freedom. 

Table 4 presents the data which are rele- 
vant to our last hypothesis. The pertinent 
comparisons which test this hypothesis are 
the comparisons of the percentage differences 
betweeg Columns 1 and 2 in relation to the 


square 
* Percentages are based on the number in the corresponding parentheses; 
in the table, represents the percentage of low producers in each case 


(41) 11% (83 


(84) 21% (38) 23% 


(61) 


test, at the .OS5 level or better and 


the complement 


predicted direction 
Ho each percentage, not appearing 


corresponding differences between Columns 3 
and 4, 5 and 6, 7 and 8. Columns | and 2 
represent workers who have a high need and 
who are also free; the other three pairs of 
columns respectively represent workers who 
have high need but are not free, who have 
low need but are free, and who have low need 
and are not free. The first column in each 
pair represents positive, and the second nega- 
tive (and/or neutral) path-goal perceivers. 
Hypothesis 4 predicts that, for a given goal 
item, the percentage difference between Col- 
umns 1 and 2 will in each case be greater 
than the corresponding difference from any of 
the other three pairs of columns. 

According the above — specifications, 
eighteen separate comparisons, six for each 
goal item, are possible on the basis of the 
data in Table 4. Half of these comparisons 
pertain to the instrumentality of high pro 
ductivity and the other half to the instru 
mentality of low productivity. The results 
are generally as predicted. Thus, consider 
ing the goal of making more money in the 
long run, instrumentality of high productivity, 
we find that, whereas there is a difference in 
high producers of 44% between Columns | 


to 
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and 2, the corresponding differences between 
Columns 3 and 4, 5 and 6, and 7 and 8, are 
only 1%, 6%, and 16%, respectively. In 
short, the difference in high producers be- 
tween positive and negative path-goal per- 
ceivers is greatest for the high need-free 
group workers. In al!, 14 of the 18 possible 
comparisons yield expected differences, 2 show 
zero differences, and 2 comparisons result in 
differences in the opposite direction. Con- 
sidering the fact that many of the compari- 
sons involved represent cells of relatively few 
cases, Hypothesis 4 receives substantial sup- 
port from the findings. The data of Table 4 
also support the original path-goal hypothe- 
sis: 18 of the 24 possible comparisons be- 
tween positive and negative perceivers are 
as predicted by Hypothesis 1, there is a zero 
difference in one case, and 5 differences are 
not as predicted. Therefore, the relationship 
between path-goal perception and productiv- 
ity remains fairly clear when we control for 
level of need and level of freedom simultane- 
ously. 


Discussion 


Four hypotheses deriving from the theory 
were supported by the data. The findings in- 
dicate that if a worker sees high (or low) 
productivity as a path to the attainment of 
his personal goals, he will tend to be a high 
(or low) producer. This relationship is more 
pronounced among workers who have a high 
need with respect to a given goal and who 
are free from constraining forces than among 
workers lacking in these characteristics. When 
compared across Tables, the results also show 
that, functionally, high productivity viewed 
by the person as instrumental to the attain- 
ment of a goal does not operate differently 
from low productivity when the latter is 
viewed as impeding goal attainment, and 
vice versa. In both cases, the path-goal per- 
ception variable seems to be a significant de- 
terminant of individual productivity. 

It cannot be expected, however, that all of 
the variance in individual productivity could 
be accounted for in path-goal terms. First, 
this approach deliberately emphasizes the role 
of rational aspects in human behavior while, 
as is known, nonrational aspects are also 
important. Second, a separate attempt at 
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multivariate analysis, by means of a modi- 
fied multiple correlation technique, indicated 
that only a modest portion of the variance in 
productivity was explained in path-goal terms. 
Finally, other social psychological variables, 
e.g., group norms, also have a determining 
influence on productivity, especially in situa- 
tions where cooperative effort is essential. 
Therefore, the path-goal approach should be 
considered as supplementing and not as a 
substitute for other useful approaches. 

A comparison of the findings shows that the 
path-goal approach holds perceptibly better 
for the goal item “making more money in the 
long run” than for the items “getting along 
well with the work group” and “promotion to 
a higher base rate.” This suggests that goals 
may function differentially with respect to 
the perception-productivity relationship. In 
this connection, moreover, work with some 
other goal items which, as was earlier indi- 
cated, were of lower relevance to the sample 
than the above three items yielded less clear- 
cut results, confirming the importance of goal 
relevance. Still another problem may arise 
from the fact that several congruent goals, 
regardless of their particular relevance to a 
given population, might be simultaneously at- 
tained by the individual through production 
at a given level. These phenomena require 
further exploration so that a fuller statement 
of the relationship among goals attainable in 
the work situation can be made possible. 

Future improvements might include: (a) 
the development of more adequate measures 
for the concepts of level of need and level of 
freedom; (b) the study of social factors in 
the work situation which affect or determine 
an individual’s path-goal perceptions; (c) a 
study of the implications of goal relevance or 
salience and of goal congruence; (d) efforts 
toward the acquisition of an adequate sam- 
ple of goal items from the variety of signifi- 
cant items which may be found in the work 
situation; and (e) the application of the path- 
goal approach to additional samples and set- 
tings, e.g., nonincentive work populations, un- 
der various conditions. 


Summary 


In this article, we have presented a path- 
goal approach to productivity. In an effort 
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to understand the effects of certain social psy- 
chological factors on individual productivity 
in organizations, we tested four hypotheses 
deriving from the following formulation: If a 
worker sees high (or low) productivity as a 
path to the attainment of one or more of his 
personal goals in the work situation, he will 
tend to be a high (or low) producer, assum- 
ing that his need is sufficiently high, or his 
goal is relatively salient, and that he is free 
from barriers to follow the desired path (high 
or low productivity). The results of this 
study provide support for the predictions 
made and, within limits, indicate the useful- 
ness of a rational approach to the problem 
in question. They provide a «!var confirma- 
tion of the importance of the rvie of rational 
aspects in the determination of productivity 
behavior and serve to re-emphasize the fact 
that productivity is a function of both fa- 
cilitating and inhibiting forces, forces of an 
individual as well as of a situational charac- 
ter. However, a number of implications of 
the path-goal orientation for the understand- 
ing of what determines productivity level, our 
initial question, still remain to be worked out. 
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An Objective Method of Determining Credit 


James D. 
New York 


A nationally known credit house presents 
the problem: “How can we improve our meth- 
ods of investigation?” The methods currently 
in practice were developed over the years by 
a slow process of trial and error. Informa- 
tion regarding individual business firms is 
gathered in the field by men who write sub- 
jective descriptions of these firms. Certain 
things such as capitalization, location, marital 
status, and age of the owner are standard 
items that are nearly always included. Such 
reports are made every year and more fre- 
quently when it is discovered that a change 
in the business has occurred. The field re- 
ports are sent in to the credit house where 
they are analyzed and written up in stand- 
ard form by some competent person. A sub- 
jective estimate of credit is then drawn from 
these standard descriptions. 


Method 


The method demonstrated here, limited by tech- 
niques prescribed by the company, is that of deter- 
mining objective values for specific items reported 
in the business descriptions. Twenty-four hundred 
of these descriptions were used, each one relating to 
a particular business. Eight hundred of the cases 
represented businesses that were definitely successful, 
800 that were definitely unsuccessful, and 800 that 
were bankrupt. No questionable or ambiguous cases 
from the bottom of the successful or top of the un- 
successful distributions were included in the study 
The classification of the business concerns was done 
by the credit house which also took responsibility 
for randomizing the sample, which means, in this 
case, determining the type of business, the part of 
the country, and the urban-rural characteristics of 
the cases studied. As the reports used are part of 
the functional, daily operations of the credit house, 
the identity of the particular business firms is not 
divulged 

Successful businesses are those that are profitable. 
Unsuccessful businesses are those that avoided actual 
bankruptcy but earned little more than operating 
expenses. Bankruptcy is a legal and objective con- 
dition. 

In order to determine the relative chances of suc- 
cess, failure, or bankruptcy, specific items were then 
taken from the standard descriptions and their oc 
currences noted among the successful, unsuccessful, 
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and bankrupt firms. The items used were such 
things as the sex of the owner, his marital status, 
his first, second, and third business venture, the 
number of employees, and capitalization. Percent- 
ages for these items are given in Table 1. Also re- 
ported in Table 1 are t scores which indicate the re- 
liability of the difference between the percentages 
(1). 

It can be noted in Table 1 that the field investi- 
gators did not get information on every item every 
time. For example, the married and single cases 
combined should represent 100% of the cases. This 
is not true of all the samplings, indicating that the 
field observers sometimes failed to report whether 
the owners were married or single. Omissions, how- 
ever, are on a chance basis, and it is believed that 
they do not seriously impair the demonstration of 
the method. 


Results 


The ratios obtained in Table 1 can now be 
used in two ways; first, to establish objec- 
tive credit statements for individual houses, 
and second, to test the broad generalizations 
which the “expert” analysts, using more sub- 
jective methods, were continually making. 

In preparing an objective credit statement, 
a standard descriptive form of a particular 
business is taken and the ratio value is found 
in Table 1 for the items carried. However, 
only those items are used that are discrimi- 
nating. Being married, for exampie, is not a 
reliable indicator, so is not used. Being 
single, on the other hand, does have some 
relatively reliable, unfavorable implications, 
and so would be included. For example, a 
particular owner is 70 years old, employs 
seven people, and has a $15,000 net worth 
ascribed to his business. Values are obtained 
by observing the proportional occurrences of 
these items among the successful, unsuccess- 
ful, and bankrupt. All this material is in 
Table 1. 

If these data in Table 1, the 800 success- 
ful, 800 unsuccessful, and 800 bankrupt firms, 
are now referred to all the firms studied by 
the field observers (that is, the total sample 
from which they were taken), they can be 
further weighted to arrive at a prediction of 
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Table 1 


Percentages and ¢ Scores of Items as Found Among Successful (S),” Unsuccessful 
Bankrupt (B) Businesses* » 


Sex: male 

female 

» 20-24 
25-29 
30-34 
35-39 
40. 44 
45 49 
50-54 
55-59 
0-64 
65 
70 


Marital status 
single 
married 


Business venture 
first 
second 
third 


Number of employees 


Family assistant 
Fire record: 
0 
fire started on premises 


fire on other7premises 


Net worth 
— 499 
500-5999 
1000-1999 
2000-2999 
3000-4999 
5000-9999 
10,000-19,999 
20000-34999 
35,000.49 999 
50,000--74,000 
75,000-124,999 
125,000-249,999 0 


® A dash indicates reliability below the 10% level.of confidence 
» The ¢ scores indicate the reliability of the difference between two percentages 
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Table 2 
The Chances in Percentages for the Success of a 
Particular Business 
(Data from Table 1) 


Unsuc 
cessful 


Success 
Characteristics ful 


Age 70 2 
Employees —7 2 
Net worth $15,000 13 


Unweighted changes for 
success or failure i 17 
Sampling multiplier 0 


Weighted prospects for 


success or failure 31.20 1.70 


Reduced toa base ofone 34,663 1.888 


success or failure for the year. If all bank- 
rupt houses totaled 1% of the cases for a 
given year, the unsuccessful without bank- 
ruptcy totaled 10%, borderline cases, 9%, 
and successful business houses, 80°, we can 
reach a final weighting and statement of prob- 
able success or failure as given in Table .2. 
(The borderline cases, amounting to 9%, as 


described in the “Method” section, represent 
businesses that fall between the “bottom” of 
the distribution ef successful businesses and 
the “top” of the distribution of unsuccessful 
businesses; that is, their evaluation was so am- 
biguous that the credit house did not wish to 


classify them.) This statement should be re- 
vised whenever field observations are made. 

If a firm were to allow credit to the con- 
cern described in Table 2 it would take 34,000 
chances that the business in question is a 
stable successful business, to 18,000 that it is 
in an unprofitable, unsuccessful condition, to 
1 that it has the characteristics of bankruptcy. 
These chances, as with every credit state- 
ment, relate to the circumstances on the day 
when the field observations were made. The 
average deterioration until the observations 
are revised is not known objectively, although 
it could and should be determined. It is one 
of the assumed factors on which the credit 
house now stakes its reputation. Experience 
indicates that yearly revision of business- 
house descriptions, with the local field ob- 


server left responsible for detecting and re- 
porting any important interim changes, is 
relatively satisfactory. 

it must be remembered also that this state- 
ment illustrates a method: With more items, 
and more significant items reported 100% of 
the time, in various samplings of businesses 
of different types, the credit of every size 
and type of business could be established. An 
example of an item that might be added is an 
evaluation of the education and experience of 
the owner. 

The second use of the data in Table 1—to 
test the broad generalizations made by the 
“experts” using relatively subjective methods 

can be illustrated by the statements below. 
The level of confidence to be attributed to 
these statements can be determined by ex- 
amining the ¢ scores in Table 1. 

1. Prospects of success become greater with 
higher capitalization; that is, $10,000 ap- 
pears to represent a critical area where pros- 
pects of success begin to become definitely 
brighter. Examination of Table 1 will indi- 
cate that, in this sampling, the first part of 
the generalization dealing with improving sta- 
bility with greater capitalization is sounder 
than the part indicating that $10,000 is a_ 
significant area. 

2. The number of employees from zero to 
seven is a partial indication of the degree of 
success. The more employees in this group- 
ing, the greater is the probability of success. 
The objective results here are ambiguous and 
do not clearly support this statement. It 
would be better to say that three employees 
or less is an unfavorable indication, while 
four to seven is favorable. 

3. There does not appear to be anything 
very significant in the fire record. The ob- 
jective results here indicate that there is no 
significance in a business operating without 
any fires. There is no significance in fires 
that start on other premises, but fires that 
start on the premises in question occur more 
frequently with the successful business. 

4. There appears to be some personality 
factor in bankruptcy. Women and elderly 
men are more apt to be unsuccessful without 
bankruptcy. ‘This statement is not true for 
the men, but is true for women. 
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5. The married are slightly better risks 
than the single. Objective results here indi- 
cate that the married appear without reliable 
differences among the successful, unsuccess- 
ful, and bankrupt. The single, however, ap- 
pear slightly more often among the unsuccess- 
ful or bankrupt. 

6. In the capitalization range studied a 
family assistant, i.e., one who works but is 
not on the payroll, is a slightly unfavorable 
indication. The family assistant is also 
slightly more characteristic of the unsuccess- 
ful than the bankrupt. 

By taking the same data given in Table 1 
and recombining it in various ways, such as 
combining marital status with different capi- 
talization, or with various numbers of em- 
ployees, still further generalizations can be 
checked. 


Conclusions 


1. By breaking a subjective description of a 
business into individual items which can be 
treated quantitatively, it is possible to de- 
velop objective methods of evaluating credit. 
These items might be directly punched in 
cards by field observers, eliminating writing 


and rewriting. 

2. By requiring 100% reporting of items 
under investigation, such items could be vali- 
dated and properly weighted. This would 
benefit both expert evaluation and objective 
estimates. 

3. By observing more items in the field 
studies, greater reliability could be given to 
the conclusions. Examples of other items 
that might be evaluated are such personality 
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factors as: social affiliations of the owner, 
owner's success in school and college, own- 
er’s age at graduation (or age-grade ratio 
when leaving school), occupation of owner's 
parents, and other such items that might be 
tested. Items closer to the business might 
include the nature of the recording system, 
the neatness of the premises, cost figures, 
credit policies, overhead, discounts, collection 
period, number of years in business, and the 
like. 

4. As a means of making the estimate of 
“unsuccessful” and “successful” more reliable, 
the credit house should predict the prosperity 
or failure of particular firms over various pe- 
riods of time. These predictions would ob- 
viously not be published, but used to check 
and improve the evaluation of specific items. 
If education or training has any value in de- 
termining a man’s credit rating, that value 
would be more stable and allow longer term 
predictions than such items as “number of 
employees.” 

5. With standardized, objective items, it 
would be possible to program credit studies 
on electronic computers and vastly increase 
the number of factors considered in establish- 
ing credit, without loss of time. Different 
types of businesses and many variants in busi- 
ness environment could also be more quickly 
and thoroughly investigated. 
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The job requirements of any given job can 
be thought of as the personal characteristics 
which the job requires on the part of incum- 
bents for reasonably satisfactory performance. 
Job requirements typically are established for 
individual jobs, this being done either on the 
basis of “judgment” or on the basis of a sta- 
tistical validation procedure. Not often have 
circumstances developed in which it has been 
possible to take an across-the-board look at 
the job requirements for large samples of jobs 
in order to determine what patterns they 
form. This study was carried out in order to 
determine the patterns of job requirements 
of a large, relatively representative sample of 
jobs. 

In general terms, the study included the fol- 
lowing steps: a factor analysis for a sample 
of jobs of a large number of worker variables, 
the derivation for each job of a factor score 
on each of the resulting factors, the grouping 
of jobs into “levels” on each factor, and the 
classification of the sample jobs into patterns 
in terms of the permutations of all possible 
factor score levels. 


The Basic Data 


The data for this phase of the program were 
made available to the Occupational Research 
Center by the U. S. Employment Service. 
These data were furnished in the form of 
IBM cards. The data had been developed 
by the U. S. Employment Service in connec- 

!'This study was carried out by the Occupational 
Research Center, Purdue University, under the provi- 
sions of a subcontract between the Purdue Research 
Foundation and the U. S. Employment Service, Bu 
reau of Employment Security, U. S. Department of 
Labor, with funds provided by the Department of 
the Air Force (Air Force R & D Project No, 200 
003-0010). 


tion with a broader tesearch project that was 
being carried out. 


The Sample of Jobs 


The sample of jobs consisted of 4,000 jobs 
from the Dictionary of Occupational Titles 
(7). This sample had been selected by the 
U. S. Employment Service, and was an ap- 
proximately representative sample from the 
Dictionary in terms of major occupational 
groups.” 


The Variables 


Forty-four ‘specific variables, falling in six 
major Classes, were used in the study. These 
six classes, plus two others (Work Performed 
and Industry), had been selected for investi- 
gation by the U. S. Employment Service 
through a series of conferences including both 
intra-agency personnel and outside profes- 
sional consultants. The six classes of vari- 
ables used in this particular study were those 
which are primarily worker oriented as op- 
posed to job oriented, in that they reflected 
the worker attributes that presumably are 
differentially pertinent to success on various 
jobs. They might then be thought of as job 
requirements. The development of the pro- 

“A listing of the jobs is included in Estimates of 
worker trait requirements for 4,000 jobs, U. S. Em- 
ployment Service, Bureau of Employment Security, 
U. S. Department of Labor, available from the Su 
perintendent of Documents, Washington 25, D. C. 
Price $2.25. The distribution of the sample of jobs 
by major occupation groups of the Dictionary of 
Occupational Titles follows (percentages in parenthe- 
ses are the percentages of jobs in the Dictionary in 
the major occupational groups): Professional and 
managerial, 11.25% (10%); Clerical and sales, 8.75% 
(5%); Service, 3.13% (3%); Agricultural, fishery, 
forestry, 1.87% (2%); Skilled, 31.25% (15%); 
Semiskilled, 26.25% (38%); Unskilled, 17.50% 


(27%). 
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cedures for rating the variables was carried 
out by the U. S. Employment Service staff. 
Following is a listing of the variables, with 
notes regarding the numerical scales used for 
them: 


Training Time: 
1. General educational development 
scale: 7 = high; 1 = low) 
. Specific vocational preparation (9-point scale: 
9 = high; 1 = low) 


(7-point 


Aptitudes: (For all aptitudes but “Talents” a 5-point 
scale was used: 5 =high; 1 = low). 
3. Verbal 
. Numerical 
. Spatial 
. Form perception 
. Clerical perception 
. Motor coordination 
. Finger dexterity 
. Manual dexterity 
. Eye-hand-foot coordination 
. Color discrimination 
. Intelligence 
. Talents (1 = present; 0 = absent) 

Physical Capacities: (For all but “Strength” a di 
chotomy was used: 1 = present-important; 0 
not present—not important). 

15. Strength (5-point scale: 5 
sedentary) 
. Climbing and balancing 
. Stooping and kneeling 
. Reaching 
. Talking and hearing 
. Seeing 


= very heavy; 1 


Temperaments: (These can be considered as_ the 
adaptability to the specified conditions or cir- 
cumstances. Two “most important” tempera 
ments selected for each job: 1 = selected; 0 
not selected). 


Adaptability to conditions or circumstances that 
involve: 
21. Variety of duties 
22. Repetitive or short cycle operations 
23. Little or no opportunity for independent ac 
tion or judgment 
24. Direction, control and planning of activities 
25. Dealing with people 
26. Working in physical isolation from others 
27. Influencing people in their ideas and opinions 
28. Performing under stress 
. Evaluation of information against judgmental 
criteria 
. Evaluation of information against measurable 
criteria 
. Interpretation of feelings or ideas in terms of 
personal viewpoint 
. Precise attainment of set limits, tolerances or 
standards 
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Interests: (These can be considered as interests in 
the specified activities. Each pair of bipolar in- 
terests was treated on a 3-point scale; 3 = first 
bipolar interest listed; 2 = neither bipolar inter- 
est selected; 1= second bipolar interest listed. 
Two interests were identified for each job, 
namely, those two that were considered “most 
important”; these were coded as 3 or 1 for their 
respective bipolar interest pairs, and all other 
interests were coded 2). 


Interests in: 

33. Things and objects vs. People and ideas 

34. Business contact vs. Scientific 

35. Routine, organized activities vs. Abstract, crea- 
tive activities 

36. Social welfare activities vs 

37. Prestige satisfactions vs 
satisfactions 


Nonsocial activities 
Tangible, productive 


Working Conditions: (For all but “Outside-inside” a 
dichotomy was used: 1 = present-important; 0 
not present- unimportant) 

The working conditions variables were of a 
somewhat different nature than the others since 
they characterize, specifically, more the work 
environment than worker attributes. They were 
included in the study, however, since there seems 
to be a direct translation from a working con- 
dition, as such, to the adaptability of people to 
the working condition Thus, these variables 
may be thought of more as adaptability to the 
conditions, rather than as the conditions them- 
selves 


Adaptability 
of: 

38. Outside-inside (4 

inside; 1 = inside) 

39. Extremes of cold 

40. Extremes of heat 

41. Wet and humid 

42. Noise and vibration 

13. Hazards 

44. Fumes and toxic 


to conditions characterized in terms 


outside; 2 = outside and 


conditions 


The Analysis of the Sample Jobs 


The 4,000 jobs had been analyzed in terms 
of these variables by U. S. Employment Serv- 


ice job analysts. The manuals used in this 
process were heavily illustrated with bench- 
mark jobs to characterize the various vari- 
ables or degrees thereof. 


Procedures 


Correlations were computed to develop the 
intercorrelation matrix. In the case of the 
continuous variables, product-moment corre- 
lations were used, and, in the case of di- 








360 


chotomous variables, point biserial correla- 
tions were used.* The factor analysis was 
carried out by the use of IBM equipment, 
using Hotelling’s principal components method 
(4), as adapted to IBM by Hestenes and 
Karush (3). The factor problem was stated 
in terms of eigenvectors presented in Holz- 
inger and Harmon (5). Communalities were 
estimated by Thurstone’s miniature matrix 
method (6). The factors were rotated in 
terms of a reference vector solution to a cri- 
terion of simple structure. The reference 
axes were allowed to become oblique to each 
other. The rotation procedure was carried 
out using Cattell (1) as a guide. 

Upon completion of the rotation process, 
the factor matrix (actually a reference vector 
matrix in this case) was expressed in terms 
of correlations between variables and refer- 
ence vectors. In order to interpret the fac- 
tors and to estimate these factors, it was 
necessary to convert this reference vector 
matrix into a true factor matrix expressed in 
terms of correlations between variables and 
factors. By a personal communication, Cat- 
tell (2) provided the equation necessary to 
produce this transformation. 


The Factors 


Any interpretation placed on the factors re- 
sulting from any factor analysis is of course 
just an attempt to summarize the meaning 
expressed by the patterns of correlations be- 
tween variables and factors, taking into ac- 
count the nature of the variables used and 
the way in which the data were obtained. 
The process of interpretation is strictly a logi- 
cal one and is not empirical. 

In the present factor analysis there is an 
additional special corisideration that is perti- 
nent to the interpretation of the factors. 
Most factor analyses in the area of psychol- 
ogy have been carried out with data describ- 
ing psychological phenomena with respect to 
human beings; the factors resulting from such 
studies, therefore, usually would be inter- 
preted with regard to psychological charac- 
teristics of human beings. In the present 

8 It was the judgment of the investigators that the 


point-biserial correlation would be the most justifi- 
able one to use with the dichotomous variables 
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study, however, the factors derived describe 
the combinations of the requirements of jobs 
in terms of worker attributes, and not basic 
human attributes. Any resulting factor could 
thus incorporate variables of very different 
natures (as far as human traits are con- 
cerned), and would not necessarily embody 
variables that have the same common de- 
nominator of a single, basic human attribute. 

In view of the fact that the factors result- 
ing from this study would not be expected to 
have the same kind of psychological meaning 
as those resulting from more conventional fac- 
tor analyses, the naming of the factors pre- 
sented a problem. While names were selected 
that were considered the best ones to char- 
acterize the factors, it is granted that they 
do not reflect entirely adequately the gamut 
of variables that emerged for the individual 
factors. 

Seven factors emerged. The names that 
were given to them are listed later in the 
article. 


Factor Estimation 


With the factors identified, attention was 
then directed toward the development of pat- 
terns of job requirements in terms of these 
factors. In order to do this, it was first de- 
sirable to obtain factor scores for each job 
for each of the seven factors. For this 
purpose, the Wherry-Doolittle test selection 
method was used in order to identify, for 
each-factor, the variables that best estimated 
the factor. In the process, the correlations 
of the variables with the factor in question 
were used along with the original intercorre- 
lation matrix for the variables. The Wherry- 
Doolittle process was iterated to select the 
four variables that best estimated the factor.* 
Using the four variables selected for each 
factor, a regression equation was developed 


*The process was stopped after the fourth vari- 
able for several reasons. For example, usually vari- 
ables after the third or fourth do not add much to 
multiple prediction. In addition, because of the 
large sample (N = 4,000), a significance test would 
probably indicate significant additional prediction for 
extremely small increments, although such incre- 
ments might have little practical value. Further, in 
terms of practical machine operations a limit of four 
variables was desirable. 
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Table 1 
Factors Resulting from Analysis, and Four Variables Selected for Each by Wherry-Doolittle 
Method, with Regression Weights 


(The variables selected were used in deriving factor scores for each job) 


Variable 
Regression” 
Factor N Name Weight 
1. Mental and Educational Development TT General educational development 2.20 
vs. Adaptability to Routine YY Specific vocational preparation 1.28 
Intelligence 


4.2 
Repetitive or short cycle opera $15.3 


tions 


2. Adaptability to Precision Operations T : Precise attainment of set limits, 37.12 

tolerances or standards 

Routine, organized activities vs + 3.62 
Abstract, creative activities 

Social welfare activities vs. Non + 3.79 
social activities 

Prestige vs. Tangible, productive + 3.55 
satisfactions 


3. Body Agility J Eye-hand-foot coordination + 4.31 
Climbing and balancing + 14.53 
Stooping and kneeling + 6.89 
Inside-outside + 3.18 


. Artistic Ability and Esthetic Appre J Color discrimination 1.48 
ciation d . Talents + 36.51 
T Evaluation of information against 5.48 
judgmental criteria 
Interpretation of feelings or ideas 36.25 
in terms of personal viewpoint 


Manual Art Ability i Motor coordination 
Finger dexterit, 
Manual! dexterity 
Prestige vs. Tangible productive 


satisfactions 


Personal Contact Ability vs. Adapta TT General educational development 
bility to Routine Reaching 
Talking and hearing 
Dealing with people 


7. Heavy Manual Work vs. Clerical / Clerical perception 
Ability r : Strength 
Social welfare activities vs. Non 
social activities 


Fumes and toxic conditions 


® Legend of type of variable: TT—Training Time; A—Aptitudes; P¢ Physical Capacities; 7 
we Working Conditions 

+ For purposes of information, the constants for the factors were: 1-44.50; 2-50.59; 3—49.36; 4. 49.03: § 95 
7~— 68.50, 


Temperaments; 1 Intere 
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Table 2 


Composition by Factor Score Levels of Patterns With More Than 100 Jobs 


(H—High; A—Average; L—Low)* 


Factor 
Pattern 


No. 3 4 


a 


8 
9 


— 
. 


‘ 


‘ 


‘ 

— 
— 

mm 


I 
I 
I 
32 I 
50 L 
51 L 
I 
J 
I 
I 
J 
I 


. = 
‘ 


‘ 


54 
55 


‘ 


A 


. 


88 I 
101 I 
I 
I 


4 ‘ 


‘ 


102 
109 


‘ 


. 


. ‘ ‘ ‘ 


‘ 
‘ ‘ = 


‘ 4 


‘ 


No. of 


Jobs Example of Job 


201 . 
199 
124 
231 
166 
116 
370 
233 
126 
478 
118 
125 


Test-engine mechanic 


Precision lens grinder 
Scaffold builder 
Editor, newspaper 
Coremaker, floor 
Chemical mixer 
Chair upholsterer 
Key-punch operator 
Chipper, foundry 
Conveyor loader 
Meter reader 

Coal passer 


*lor Factor 1 there are three factor score levels (High, Average, and Low); for all other factors there are only two (igh 


and Low) 


that incorporated appropriate & weights and 
a constant.° 

The four variables selected for each factor 
are shown in Table 1, along with their re- 
spective b weights. 

Derivation of factor scores. Factor scores 
were then computed for each of the 4,000 
jobs, one score for each of the seven factors. 
This was done by using the derived b weights 
and the original scores for the selected vari- 
ables. 

Factor scores and factor score levels, A 
factor score distribution was then prepared 
for each of the seven factors, giving the fre- 
quency of jobs for each factor score. These 
factor score distributions were then divided 
into “levels”; for all of the factors but the 
first this consisted of a division into two lev- 
els (1 and 2); since the first factor seemed to 
be the one of greatest importance, its range 
was divided into three levels (1, 2, and 3). 

In the division of the factor score ranges 
into levels, cutoffs were made using the cri- 
teria of: (a) “natural” breaking points in the 
range; and (b) approximate equality of the 
number of jobs in each level if there were no 
“natural” breaking points. Thus, each job 

° The factor or criterion means and standard de- 
viations used in deriving the b weights were for a 
standardized distribution with a mean of 50 and a 
standard deviation of 10. 


had seven factor scores of 1 or 2 (or, in the 
case of the first factor, of 1, 2, or 3). 

Following is a listing of the factors, with 
an example of a job in each factor score 
level: ° 


1. Mental and Educational Development vs. Adapta- 
bility to Routine 

High (1) Metallurgist, Physical 
Average (2) Boilermaker, Maintenance 
Low (3) Laborer, Warehouse 

. Adaptability to Precision Operations 
High (1) Wheel Alignment Mechanic 
Low (2) County Agricultural Agent 

. Body Agility 
High (2) Plumber 
Low (1) Pay Station Attendant 

. Artistic Ability and Esthetic Appreciation 
High (1) Photographer, Commercial 
Low (2) Teamster 

. Manual Art Ability 
High (2) Precision Lens Grinder 
Low (1) Airways Observer 


4 


® The entries in parentheses shown for each factor 
are the factor score levels (1 and 2; or 1, 2 and 3); 
low entries are associated with low numerical factor 
scores and vice versa. Whether a low numerical 
score is associated with a low or high degree of the 
factor is largely a function of the scoring system 
used with the variables, and of the signs of the vari- 
ables in the regression equation. The terms high and 
low, as used in this list, apply to the relative degrees 
of the factor listed, not to the numerical values of 
the factor scores. In entering the job examples, the 
one with the “high” degree of the factor is the first 
one listed. 
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6. Personal Contact Ability vs. Adaptability to Rou- 
tine 
High (2) Cashier, Front Office 
Low (1) Rag Sorter 
7. Heavy Manual Work vs. Clerical Ability 
High (1) Laborer Wharf 
Low (2) Auditor 


Development of Patterns 


The seven factor scores for each job were 
punched into IBM cards. These cards were 
then sorted into all possible permutations of 
factor score levels on all factors. Each such 
permutation would then be unique in that it 
would incorporate a distinct combination of 
factor score levels. Each of these permuta- 
tions was then considered to represent a “pat- 
tern” of job requirements. 


Results 


The permutations of factor score levels 
would make possible a total of 192 unique 
patterns. Actually, the jobs fell into only 
115 of the patterns. For the patterns into 
which jobs did fall, there was very wide vari- 
ability in the number of jobs in the different 
patterns, ranging from one job per pattern to 
478 in another pattern. Table 2 lists the pat- 
tern composition of all of the patterns with 
more than 100 jobs, and also shows the num- 
ber of jobs in each of these patterns. An ex- 
ample of one job is given for each of these 
patterns. 

Table 3, in turn, shows the distribution of 
patterns by job frequency. The class inter- 
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vals in this table (jobs per pattern) charac- 
terize the patterns in terms of the number of 
jobs falling in the individual patterns. This 
shows for the 115 patterns that 82 patterns 
had 20 jobs or less, that 8 patterns had be- 
tween 21 and 40 jobs, etc. 

By looking at the cumulative columns for 
both patterns and jobs, it will be observed 
that 12 patterns accounted for 60% of the 
jobs, 20 patterns for 75° of the jobs, and 33 
patterns for 88% of the jobs. It thus ap- 
pears, in terms of patterns of jobs require- 
ments, that there are very heavy concentra- 
tions of jobs in relatively few patterns. To 
put it another way, it seems that jobs collec- 
tively do not scatter themselves to the four 
winds as far as job requirements are con- 
cerned, but rather tend to fall into certain 
predominant molds. This, of course, has im- 
plications for placement, vocational guidance, 
and other purposes. 

A note of explanation, however, is in order 
here. These concentrations are in terms of 
numbers of different jobs, and not in terms of 
numbers of people on the jobs. It is possible, 
for example, that a job that has a pattern 
unique to itself could be a very important job 
as far as number of incumbents on the job is 
concerned, 


Limitations of Study 


Certain possible limitations need to be kept 
in mind in interpreting the results of this 
study. In the first place, the original data 


Table 3 


Distribution of Patterns by Job Frequency 


No. of Patterns 


By 
Class 
Interval 


Jobs Per 
Pattern 


Cumu 
lative 


1-20 82 115 
21-40 33 
41-00 § 25 
61-80 
$1-100 
101-200 
201-300 
301 and over 


No. and Percentage of Jobs 


By Cumulative 
Class 
Interval No / 
484 
251 
230 
285 
343 
974 
935 
478 


4000 
3516 
3265 
BOIS 
2730 
2387 
1413 

478 
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were based on ratings and not on statistically 
validated analyses; there is some tangential 
evidence, however, that the ratings have rea- 
sonable validity. Further, the results are of 
course a consequence of the procedures used 
in the analysis of the jobs. Particular atten- 
tion is called to the fact that in the ratings of 
Temperaments and Interests the two of each 
that were considered most important were 
identified for each job; it is of course possible 
that some jobs might have more, or less, than 
two important Temperaments or Interests. 


Summary and Conclusions 


This study involved both a factor analysis 
of job variabies and the development of pat- 
terns of job requirements in terms of these 
factors. The major results can be summa- 
rized us follows: 

1. The factor analysis of 44 variables re- 
sulted in the emergence of seven factors that 
may be thought of as job requirement factors. 

2. The classification of jobs into patterns 
of job requirements (in terms of factor score 
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level) revealed a strong concentration of jobs 
in a very limited number of the various pos- 
sible patterns. 


Received August 6, 1956. 
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Relationship of Rhythm Discrimination to Motor Rhythm 
Performance 


Olin W. Smith! 


Cornell University 


The Seashore Measures of Musical Talent 
(4) include two subtests—auditory time and 
rhythm discrimination. The inclusion of these 
tests in a battery designed to predict musical 
performance requires the assumption that they 
correlate positively and significantly with the 
motor performance of corresponding tempo- 
ral or rhythmic tasks (the performance of 
short time intervals which may be either equal 
or of mixed values such as are normal to 
rhythms). The evidence for the above as- 
sumption is at present unsatisfactory. 

R. H. Seashore attempted to establish the 
relationship between time and rhythm dis- 
crimination and motor rhythm performance 
by correlating time and rhythm tests with a 
motor rhythm task performed at two differ- 
ent rates. He obtained significant correla- 
tions, but his comments on the administration 
of the motor tests vitiate the significance of 
his r’s. He states, “If the record shows a 
systematic error of anticipation or delay, the 
observer is told of this and is given oppor- 
tunity to correct it” (5, p. 146). He men- 
tions that learning can correct up to 40% of 
the error but gives no standards for an ac- 
ceptable performance or for one needing cor- 
rection. In an article published 25 years later 
(6), Seashore further complicated the situa- 
tion by reporting that performance scores on 
a motor rhythm pattern did not correlate sig- 
nificantly with scores on two other motor 
rhythm patterns when the same apparatus 
was used for all three. This, of course, throws 
doubts upon the generality of performance 
scores from a single task. Consequently, the 
present study was devised not only to repeat 
Seashore’s experiment, but also to permit a 
more general analysis of the interrelation- 
ships of auditory rhythm and time discrimi- 


1 The writer is indebted to T. A. Ryan for his ad- 
vice and counsel, to Florence Jackson and Dorothy 
Hubbard for their computational assistance, and to 
Frank Rosenblatt for the IBM analysis. 


nation and their corresponding motor per- 
formance. 


Tests 


The test battery included: (a) The Sea- 
shore time (TD) and (6) rhythm discrimi- 
nation tasks (RD), series A and B (4). (c) 
A rhythm performance (RP) test: the motor 
task was to tap a telegraph key in coincidence 
with auditory rhythm signals (following) and 
then to continue tapping out the rhythm (re- 
producing) until told to stop. (d) Equal in- 
tervals performance test (EIP): five short in- 
tervals were presented for performance in two 
series, A and B. Series A preceded the motor 
rhythm tasks. Series B followed the motor 
rhythms. 

The first rhythm was the one established 
by Fraisse (1) to be the most “natural” 
motor rhythm. Seven other rhythms were 
chosen from the Seashore Series A and B ac- 
cording to the proportions of correct responses 
by the adult standardizing group. Table 1 
lists the time values assigned to the rhythms 
and also their order of presentation. Each of 
the eight rhythms was repeated at three dif- 
ferent rates. The temporal values of the 
rhythms and the rate of presentation at the 
recorded rate (RR) are approximately those 
of the rhythm discrimination items. The 
rates 4/5 and 5/4 are rates corresponding to 
the product of the RR rate and these two 
ratios. The rates were balanced for order 
in the first, second, and third positions. 
Rhythms were given at the same rate through- 
out any one trial. 

The short time intervals in the EIP test 
were 0.15, 0.30, 0.45, 0.60, and 0.70 second. 
Time values were equal during any one trial. 
In Series A the order of performance of the 
intervals was from shortest to longest. The 
order was reversed in Series B. 

In the two Seashore discrimination tests the 
score in each case was the sum of the cor- 
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Table 1 


Components of Rhythm Performance Test 


Rhythm 
Number 
and Source 
Order and 
of Pres Identifi 
tation cation 


Percentage 

of Correct 

Responses 
by the Adult 
Standardizing 
Population (3) 


Interval Values in 
Hundredths of 
Seconds at 
“Recorded Rate’’» 


20, 40-20 
4916-33-33 

$A 16-49- 33-33 

174 65-35-345-33-16 

on 14.43-14-58 29-14 
128 29 29-43-14-58-29-14 
SON 7 7 16-05-33-65- 43 33 
308 7 43-29-14 43-58-14 29-58 


Fraisse (1) 
1A—5* 


* Read as Rhythm No. 1 of Seashore’s Series A, five notes 
defining the rhythm (4), 

©" Recorded rate’ is approximately the rate and values at 
which the rhythms were recorded on the Seashore records (4 
The RK values of Fraisse’s rhythm are the values he reported 
to represent the most natural rhythm for his Ss (1). 


rect responses on both sides of the record. 
Scores for all the motor tasks were computed 
in the same way. A kymograph carried re- 
cording pens for the auditory signals to which 
S responded, the responses on the telegraph 
key, and a time (second) marker. The 
rhythm and equal interval signals were re- 
corded throughout the periods when S$ fol- 
lowed and reproduced them. The signal cir- 
cuit to S’s loud speaker was opened after S 
had completed his last repetition for follow- 
ing. The circuit remained intact to the signal 
pen during reproduction by S. The S ini- 
tially listened to the first four repetitions of 
an equal interval or rhythm. He then prac- 
ticed tapping in coincidence with the signals 
for the next four repetitions. Repetitions 9 
through 12 inclusive were scored for repro- 
ducing. The distance between a signal pip 
and its corresponding response pip was meas- 
ured as was the separation of the correspond- 
ing second marks on the time line. The ratio 
of the two measurements gave an error term 
in seconds. For a four-note rhythm, for ex- 
ample, the error term per repetition was the 
sum of the errors of each of the four pips. 
During reproduction, the errors accumulated, 
e.g., the difference between a signal and its 
response pip (displacement) tended to become 
successively greater with each repetition of 
the rhythm. No displacement greater than 
100‘/—-the time separating the first and last 
notes of a rhythm—was measured. Instead, 
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a maximal error was assigned as will be de- 
scribed later. 

When S gave the wrong number of re- 
sponses, or when the number of responses was 
correct, but the response pattern bore little 
resemblance to the signal pattern, or when S 
failed to respond, a predetermined maximal 
error score was assigned. This was equal to 
the time between the first and last signals of 
any one pattern times the number of signals 
in the pattern.’ 

Sixteen repetitions (one trial) of an equal 
interval or of a rhythm were punched in a 
tape. The tape was fed at constant speed 
through a T.G. 10 F Keyer * which produced 
signals for S’s loud-speaker as well as the 
thyratron circuit to the signal pen solenoids 
on the kymograph. The signal and recording 
apparatus were in one room while the Sea- 
shore tests were administered in a second. 
The S’s room was well removed from the first 
two rooms. An intercommunication system 
permitted E to talk with S as well as to 
monitor the auditory signals to S and his 
record. 

Method 


Procedure. The standard directions for the time 
and rhythm discrimination tests were given to each 
S. After these tests, he was then shown the record- 
ing apparatus and a blackboard diagram of the order 
of presentation of the signals for the motor tasks 
A graphical illustration of the ideal relationship be- 
tween the auditory signals and his responses of tap- 
ping on the key was presented. The kymograph 
was explained as was the arrangement of his work 
space. He then received the following instructions: 
“You will hear a series of clicks marking off a series 
of time intervals, all of which will be equal in any 
one series. Listen to the first four clicks, then prac- 
tice tapping the key in exact coincidence with the 
next group of four clicks. Your responses will be re- 
corded on the kymograph beginning with the ninth 
click. The record of your response will run parallel 
to a record of the times when the clicks are given 
Try to depress the key at the exact moment of the 
clicks. Even after the clicks no longer sound, con- 
tinue tapping out the intervals in exact reproduction 
of the intervals you have just heard. Do not stop 
until I tell you to do so. The series which you will 
now hear is one in which the clicks are fifteen hun- 
dredths of a second apart.” The foregoing instruc- 
tions were appropriately modified for the rhythm 


2 See Weitz (7) for a concise statement of the scor- 
ing problems for measures of motor rhythm perform 
ance 

’ Gray Manufacturing Co, Hartford, Conn 
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series. The S was told to listen to the first four 
repetitions of the rhythm, to practice on the next 
four, and that his next successive both 
with and without the auditory signals would be 
recorded, 

Subjects. Sixty Ss ranged in age from 15 to 55 
years. About half were paid, the remainder were 
volunteers. The Ss were drawn from the regular 
student body, summer school registrants, graduate 
students, faculty, and permanent residents of Ithaca, 


New York, 


responses 


Results and Discussion 


The correlations between total error on 
rhythm performance and the rhythm and 
time discrimination scores were .66 and .44, 
respectively (see Table 2). Seashore’s as- 
sumption of a relationship between the dis- 
crimination and performances of rhythmic 
motor tasks is valid both for following and 
reproducing. The values he initially reported 
for the relationship between rhythm percep- 
tion and rhythm performance are essentially 
confirmed. The correlation between total 
errors on rhythm performance and on_per- 
formance of equal short intervals was .56. 
The performance tasks did not correlate as 
highly with each other as did RD with RP. 


Whenever an RP was correlated with RD, 
with EIP, or with TD, the coefficients were 
usually highest with RD, followed by EIP 
and TD in that order. 

The error scores for reproducing were much 


greater than those for following (p < .001), 
yet the orders of the two scores yielded an r 
of .96. This appeared to be due to the fact 
that the added difficulty of reproducing was 
approximately equal for most Ss. This simi- 


Table 2 
Intercorrelations and Validity Coefficients 
(All Rates Combined) 


Performance 
Tasks 


Discrimination 
Tests 
Rhythm 
Performance 
Errors* 


Rhythm 
Follow ing 


Equal 


Rhythm Time Intervals 


Total 
Following 


44 
47 
51 


Reproducing 65 96 


* Performance scores = total error in seconds per category 
+A coefficient of .325 is significantly different from zero at 
the 1% level, N = 60 
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larity in the two situations is emphasized by 
the consistency of the pairs of r’s of following 
and reproducing with RD, TD and EIP (see 
Table 3). Also, the coefficients between fol- 
lowing and reproducing at different rates are 
all in the eighties while those for the same 
rates are even higher. 

The r’s between RP and RD, EIP, and TD 
are in that order, with RP vs. RD the highest 
for both following and reproducing at all 
speeds. 

Table 4 shows the intercorrelation matrix 
for the eight rhythms. The means of the co- 
efficients for Rhythms 1 and 2, .49 and .45 
respectively, are the smallest. Rhythms 3, 
4, and 5 were of intermediate performance 
difficulty. For this reason, they probably 
correlated higher with the other rhythms, 
their means being .64, .62, and .61, respec- 
tively. 

The range for the odd-even reliability co- 
efficients for EIP is .96 to .98. For RP at 
rates 4,5, R/R, and 5/4 the odd-even co- 
efficients are all .98. These uniformly high 
odd-even correlations probably mean that 
once an error was committed it was perpetu- 
ated throughout the series. The r between 
Series A and B of EIP was .58. This un- 
reliability could be due to practice effects or 
more probably to the shortness of the EIP 
trials. The r between following and repro- 
ducing when scores are summed for both se- 
ries of EIP is .86 and is further evidence for 
consistency within trials. 

The TD test adds little to the Seashore 
measures in this study. Rhythm discrimina- 
tion predicts EIP as well as or better than 
does TD. In all comparisons, RD is a better 
predictor of RP than is TD. It is possible, of 
course, that TD may predict better the longer 
time intervals which are found actual 
musical performance. 

The different stimulus rhythms had differ- 
ential effects upon performance. For in- 
stance, the mean of the eight r’s between fol- 
lowing and reproducing on the same rhythms 
is .88, while the mean of the correlations be 
tween scores for following on different rhythms 
is only .56, for reproducing .52, and for re- 


4 Fisher's z coefficients were used for the computa 
tion of all means of r in this paper (2) 
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Table 3 


Correlations Between Discrimination and Performance Tasks with Differing Rates and Methods 


Performance Tasks* 


Rhythms 


Following 


Rate , 4/5 R/R 5/4 


Following 4/5 
R/R 
5/4 
4/5 
R/R 
5/4 
4/5 
R/R 
5/4 


83 
89 


Reproducing 94 
81 


KS 


Total 


Equal intervals 63 


Discrimination tasks 
Rhythm 
Time 


65 


41 


66 
50 


66 
AS 


66 
54 


* Performance scores 


producing vs. following (excluding the same 
rhythm) .59. A factor reducing the inter- 
correlations between the different rhythms 
may be that the nine-note rhythm exceeds or 
approaches the memory span for some Ss. _ If 
the memory span effect were functioning, this 
should reduce the variance in common be- 
tween the other rhythms and Rhythm 8. The 


total error in seconds per category. 


«Reproducing 


4/5 R/R 5/4 


Discrimination 
Tasks 


Total Equal intervals Rhythm Time 


4/5 R/R 5/4 


mean of the correlations with Rhythm 8 are, 
in fact, lower than those reported above for 
all rhythms. For instance, the mean of the 
seven r’s correlating following on Rhythms 
1-7 with following on Rhythm 8 is .46, for 
reproducing .42, and for reproducing vs. fol- 
lowing .44. It is likely that part of the vari- 
ance, particularly with the longer rhythms, 


Table 4 


Correlation Matrix of the Eight Rhythms 


Rhythms 1 


6 
7 


8 


Discrimination Rhythm 


tests Time 


Equal intervals Perf 
i 


2 


3 4 





Rhythm Discrimination and |! 


is a function of memory span (cf. Seashore’s 
r of 40 between motor rhythm performance 
and memory span (5). 

This study is an example of what might be 
called an intermediate test validation pro- 
cedure. The discrimination tests were tested 
against a criterion which was out of the con- 
text of the situations for which the tests must 
predict significantly in order to have practical 
validity. In one sense, this is a more effi- 
cient validation procedure, since the variance 
of the criterion is less subject to the influence 
of the multitude of factors affecting criteria 
such as grades and over-all rating scales. 


Summary and Conclusions 


Scores on the Seashore tests of time and 
rhythm discrimination were correlated with 
performance scores on eight rhythms, each of 
which was presented at three rates for a total 
of 24 trials. The above measures were also 
correlated with the performance of equal 
short time intervals. Positive and significant 
correlations between the discrimination and 
performance tests were obtained, confirming 
earlier results of R. H. Seashore. Rhythm 
discrimination scores correlated in all cases 
higher with rhythm performance than did the 
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time discrimination scores. Rhythm discrimi- 
nation also correlated as well as or better than 
did time discrimination with the performance 
of equal intervals. The need for inclusion of 
the Seashore time test as one of the musical 
measures was not confirmed here. 
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Effects of High Intensity Noise on Retention ' 


Howard G. Miller 


North Carolina State College 


A major concern of military aviation in the 
United States has been the problem of the 
effect of the high intensity sound field cre- 
ated by the operation of the jet airplane en- 
gine on the ground and air personnel required 
to work in its presence. Psychologists have 
been primarily interested in the effects that 
such high intensity noise may have on per- 
formance or behavior. The results of their 
experimentation have been widely reported 
and are summarized by Kryter (5) and Ber- 
rien (2). The current concern of the mili- 
tary with the problem is reflected in the 
BENOX and CHABA reports (1, 4). The 
results of research thus far conducted on this 
problem have been somewhat ambiguous. In 
general, the more thorough and better con- 
trolled studies have demonstrated no marked 
or well defined effects of high intensity noise 
on any of the aspects of human performance 
studied. 

One important aspect of human perform- 
ance not well investigated has been the effect 
of high intensity noise on the learning, reten- 
tion, and reproduction of verbal materials. 
Characteristically in a military aviation set- 
ting, orders, instructions, and communications 
in general are transmitted orally from one 
person to another, and, at some later time, 
the second person is required to act on the 
basis of the instructions which he received 
earlier. There is no direct research evidence 
to indicate whether the reproduction of previ- 
ously learned material would be interfered 
with or affected in any way by the presence 
of a high intensity noise field at the time of 
reproduction. Forgetting and retention have, 
of course, been thoroughly studied by psy- 
chologists. The literature on the subject sug- 
gests that it is possible that the presence of 
a high intensity noise field at the time when 
previously communicated orders or instruc- 


' This article is based on a dissertation submitted 
in partial fulfillment of the requirements for the 
Ph.D. degree in the Department of Psychology at 
the Pennsylvania State University. 


tions are being acted upon will have a dele- 
terious effect on their recall and on the per- 
formance of the individual involved (6). It 
is also possible that the high intensity noise 
might interfere differentially with material 
learned through auditory and visual stimula- 
tion to the detriment of the auditorially 
learned material. The following experiment 
was designed and conducted to explore these 
possibilities. 


Procedure 


The Ss were 48 male college sophomores drawn as 
volunteers from the Air ROTC classes at the Penn- 
sylvania State University. All Ss were required to 
take hearing tests, and those showing a loss of greater 
than 30 decibels on any of the frequencies tested by 
either the Maico D-5 or Maico E-1 audiometers 
were eliminated from the experiment. All Ss were 
paid one dollar per hour for the time spent on the 
experiment, and prizes were awarded to those Ss 
making the highest scores on the test material which 
was a part of the experiment. 

The sound source was designed to produce a high 
intensity noise similar to that created by the jet air- 
craft engine. The equipment used is an adaptation 
of that used by Stevens (7). The noise was ampli- 
fied and fed into the experimental room through a 
system of loud speakers. The noise level was 111 + 
Idb. (ref. 0002 dyne/cm*). The spectrum tended to 
be flat from the 100 to 6,000 cycles/sec. range and 
dropped in intensity beyond this range. 

Three learning tasks were employed in the experi- 
ment. Task I consisted of 5 equated lists of 15 
meaningful, 1 syllable, 4 letter words, drawn from 
the list of 1,000 words occurring most frequently in 
the English language according to the Thorndike 
word count (8). The lists were also equated pho- 
netically. 

Visual presentation of the lists in Task I was ac- 
complished by means of a Hull type memory drum. 
An interval of three seconds per item was employed. 
The learning period consisted of three exposures of 
each list. The lists were presented auditorially by 
means of a tape recorder, with intervals and number 
of exposures identical to those used for visual pres- 
entation. The recall test for Task I consisted of re- 
quiring the S to reproduce the list, as learned. This 
test was the same for all conditions. 

Task II consisted of five equivalent lists of 15 
meaningful statements, drawn from aviation terms 
and situations. Examples of these statements are as 
follows: “The elevation of Jones Field is 2000 feet.” 
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“Replace tires on all airplanes in Squadron 25.” 
Task II was presented visually by means of a mem- 
ory drum. Each list was exposed three times dur- 
ing the learning period with an exposure of six sec- 
onds per item. Auditory presentation was accom- 
plished by means of a tape recorder as with Task I. 
The recall test for Task II consisted of a presenta- 
tion of the original statement with one term of the 
basic information contained in the statement deleted 
and a blank space substituted as follows: Jones Field 
has an elevation of 

Task III consisted of the learning of a series ‘of 
dial settings, for a series of numbers between 1 and 
100 inclusive, to which dials were to be set during 
the test period. Five lists of dial settings were com- 
posed, each containing a series of 10 numbers drawn 
randomly. The settings were uniformly presented, 
as follows: “Set the black dial at 39.” “Set the red 
and yellow dial at 72.” Visual and auditory pres- 
entation was identical to that described for Task II. 
In the testing for Task III, the Ss were required to 
set 10 dials on a panel at the learned settings. The 
dials were identified by colors. 

Following the training or acquisition period, all Ss 
were submitted to a 30-minute retention period of 
controlled activity during which they were shown 
sound motion pictures. 

The first phase of the experimental procedure con- 
sisted of a familiarization process in which all Ss 
were acquainted with all aspects of the experiment 
One of the five equivalent forms of each task was 
used during this period, and all Ss were given ex- 
perience with visual and auditory presentation of the 
tasks and quiet and sound testing for retention of the 
learned material. In the experiment proper four ex- 
perimental conditions were employed, with each S 
participating in all four conditions and learning all 
forms of all tasks. Under two of these conditions, 
Ss learned the three tasks by means of visual stimu- 
lation and were tested under noise for one of the 
conditions and under quiet for the other condition. 
The remaining two conditions involved auditory 
presentation of the tasks with testing conditions un- 
der noise and quiet as with the visual conditions. 
A retention period of 30 minutes was employed dur- 
ing which the Ss viewed sound motion pictures 
These experimental conditions may be represented 
as follows: 


Test 
Sound 
Sound 
Quiet 
Quiet 


Training Interpolated Activity 
Auditory 
Visual 
Auditory 
Visual 


Sound motion picture 
Sound motion picture 
Sound motion picture 
Sound motion picture 


The order of the presentation of the conditions was 
counterbalanced to eliminate serial effects of prac- 
tice, fatigue, and interaction. Since there exist 24 
possible combinations of order for the 4 conditions, 
two Ss were assigned to each of the combinations 
and, for the total experimental procedure, formed a 
pair. It was possible to schedule all Ss so that no 


fewer than three nor more than five days intervened 
between consecutive sessions. 

At the completion of the fourth experimental ses- 
sion, all Ss were required to fill out a questionnaire 
in which they described their estimate of the effect 
of the noise stimulus on the recall process and their 
general reaction to the noise stimulus 


Results 


The scores of the Ss were analyzed to an- 
swer the two basic questions proposed in this 
study: (a) Does high intensity noise, similar 
to that produced by the jet airplane engine, 
affect the recall of verbal material learned 
under controlled conditions? and (6) Does 
high intensity noise affect differentially the 
recall of material learned by means of visual 
and auditory stimulation? The information 
obtained from the questionnaire was also sur- 
veyed. The following comparisons were made. 
Recall for all tasks was compared under sound 
and quiet conditions. The effect of noise on 
variability was determined by comparing 
standard deviations of groups under sound 
and quiet by the use of F ratios. Because 
some researchers (3, 9) have reported in- 
creased output with diminished accuracy un- 
der noise conditions, production and errors 
were compared under sound and quiet condi- 
tions. The question concerning the differ- 
ential effect of high intensity noise on the 
recall of visually and auditorially learned ma- 
terial was explored by comparing the differ- 
ence between noise and quiet scores for visual 
learning with the difference between noise 
and quiet scores for auditory learning. For 
all of these comparisons, none of the differ- 
ences tested approached statistical signifi- 
cance. 

Conclusions 


1. Noise of the intensity and frequencies 
employed in this experiment (111 db, ap- 
proximately flat frequency spectrum to 6000 
cycles/sec.) does not significantly affect the 
recall of verbal material learned under con- 
trolled conditions. 

2. The recall of material learned by means 
of auditory stimulation is not interfered with 
by noise to a greater extent than the recall of 
material learned by visual stimulation. 

3. According to the subjective reports of Ss, 
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the noise stimulus employed aroused minimal 
reactions in regard to eight psychological and 
somatic categories of reaction including gen- 
eral disturbance, irritation, distraction, nerv- 
ousness, fright, nausea, pain, and dizziness. 
The reports of the Ss centered around mild 
complaints of irritation, distraction, and gen- 
eral disturbance. 

4. Subjective reports of the Ss indicated 
some initial disturbances due to the noise 
stimulus but that adaptation was quickly 
achieved and that, following adaptation, the 
noise was not perceived as a noxious stimulus. 


Received December 4, 1956. 
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Validation of the Minnesota Scale of Parental Occupations 
and a Modification of the Warner Occupational 
Rating Scale ' 


Harry Beilin 


University of Minnesota 


The assessment of socioeconomic status is 
achieved in a variety of ways. In some psy- 
chological, sociological, and educational in- 
vestigations, occupation is used as the princi- 
pal indicator. Occupational classifications, in 
turn, have been based upon intellectual (1), 
prestige (4, 5) and other criteria. A number 
of investigators prefer as a measure of socio- 
economic status some environmental rating 
including the possession of material and other 
goods (7, 11, 13). Still others, Warner for 
example, are interested in composite indices 
based upon income, dwelling and neighbor- 
hood ratings, occupations, ete (15). 

Since on its face and from evidence in the 
literature, the possession of material and cul- 
tural items appear to be an adequate reflec- 
tion of general socioeconomic status, these 
have been used as the criteria for the present 
study, and two occupational rating scales have 
been related to them. Other factors such as 
intelligence and education have been intro- 
duced, since the relationship of these to socio- 
economic status has long been recognized. 

Some occupational classification systems 
have been utilized for a number of years 
without revision, even though the occupa- 
tional structure of our economy has in many 
ways changed (3). One of the prime difficul- 
ties in evaluating and using most occupa- 
tional scales is that they are based upon more 
than one criterion, and an attempt to insti- 
tute some hierarchic organization is con- 
founded by the multidimensional character of 
the occupational structure. The Census clas- 
sification scheme, to put it broadly, employs 
a skill level breakdown for the lower end of 


' This study was supported, in part, by research 
grant (M-690) from the National Institute of Men- 
tal Health, U. S. Public Health Service, and, in part, 
by funds supplied by the Institute of Child Welfare, 
University of Minnesota. The principal investigators 
of the investigation known as the Nobles County 
Project are J. E. Anderson and D. B. Harris. 


the scale and a field breakdown for the upper. 
In social mobility studies where the hierarchic 
nature of a classification scheme is probably 
its most important «‘aracteristic, clear de- 
lineation of such a s..crarchy is often lacking. 
An intergeneration ©: cupational change, for 
example, from high school teacher to presi- 
dent of a large manufacturing company would 
under ordinary circumstances be interpreted 
as a rise in social status. If one were to indi- 
cate this change, however, in some Census 
type schemes, it might be indicated actu- 
ally as a move “downward” (i.e., from = pro- 
fessional—I to proprietor-managerial status— 
Il). This limitation is not confined, however, 
to social mobility studies, for the general 
utility of any socioeconomic scaling is re- 
duced by the lack of a clear hierarchy. For 
this reason, the present study attempts, in 
part, to validate an occupational scale de- 
signed to more sharply delineate levels of so- 
cial and economic status. 


THE OCCUPATIONAL RATING SCALES 


The Minnesota Scale of Parental Occupa- 
tions (MOS), which is one of the occupa- 
tional scales studied (6, 9), has been in serv- 
ice for many years (since 1925). Used first 
as a sampling device, it has come to be em- 
ployed fairly extensively as a measure of 
socioeconomic status. It has a heavy intel- 
lectual emphasis, and, on the whole, it ap- 
proximates the Census (Edwards) type clas- 
sification. The authors (Goodenough-Ander- 
son) indicate in the most recent edition of 
the manual (9) that changes in occupations 
since the construction of the scale may neces- 
sitate revision, but this has not as yet been 
undertaken. ‘The second scale studied is a 
modification of one constructed by Warner 
and his associates (15). A revision of this 
scale was prepared by the author for a study 
in social mobility (2). This entailed a sub- 
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division of the top occupational level and 
greater specification of agricultural occupa- 
tions. The rationale underlying the Warner 
scale and the present modification rests upon 
the thesis that in a socioeconomic hierarchy 
the crucial differentiation is not in regard to 
field of work but rather the level of an occu- 
pation within a field. The professional oc- 
cupations, for example, are not grouped as 
Class 1 as in the Census, Minnesota, and other 
scales, but instead range over a number of 
levels. The skilled occupations, and others, 
do likewise. (Reproduction of the revised 


scale will be found in [2].) 


The Criteria 

Material and cultural possession criteria 
have been used in the socioeconomic rating 
scales of Sewell (14), Gough (7), Sims (13), 
The American Home Scale (10), Leahy’s 
Minnesota Home Status Index (11), and 
others.” 

The criterion data for the present study 
were obtained on a form—The Personal Data 
Sheet (PDS) which was designed to deter- 
mine the extent to which a series of material 
and cultural possessions graded in value were 
possessed in the home of the S. Included in 
the PDS are many items from the Gough, 
Sewell, and Leahy scales. The PDS yields 
three scores: a material, a cultural, and a 
composite total score. 

The material scale reflects, among others, 
the possession of the following: central heat- 
ing, auto (kind and number), refrigerator, 
deep freeze, automatic washer, shower, fire- 
place; and in addition seeks to determine: 
the nature of the plumbing (inside, outdoor), 
type of heating, and type of house construc- 
tion. 

The cultural score is based upon such items 
as: the reading of newspapers, the culture 
content of magazines read, the number of 
books possessed, etc. 

The rationale of the scoring is based upon 
the assumption that the possession of a 
greater number of material goods, or the 
more economically valuable ones, is consistent 
with a higher socioeconomic status. An analo- 


2A review of the scales in use up to 1936 will be 
found in Leahy (11). 
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gous rationale exists for the cultural items 
with a culturally valued criterion substituted 
for an economically valued one. 

The scoring of the culture content of the 
magazines is based upon the Morgan-Leahy 
ranking (12), revised and restandardized to 
reflect current magazine circulation, and in- 
cludes as well farm journals which were not 
included in the 1934 Morgan and Leahy 
study. 

The culture, material, and total scores re- 
sult from the unweighted addition of indi- 
vidual item scores. 

The PDS was administered to the 8th and 
10th grades of a high school in St. Paul, Min- 
nesota. In selecting the sample for the pres- 
ent study a random 150 cases were chosen 
from this metropolitan area with approxi- 
mately an equal number of each sex, and the 
same number from the corresponding grades 
from an essentially rural county in southern 
Minnesota. Intelligence test: data were avail- 
able for the rural group but not for the met- 
ropolitan sample. 

Father’s occupation was classified accord- 
ing to the Minnesota Scale (MOS) and the 
author’s revision of the Warner Scale (BOS). 
Reliability of classification using the MOS 
was determined by two psychologists, one of 
whom had been in this country only four 
years. The correlation between their occu- 
pational classifications is .77. 


Results 


1. The intercorrelations, indicated in Ta- 
ble 1, show no relation between occupations 
of the fathers of the Ss and the intelligence 
test scores of the Ss. Although a relation- 
ship between occupation and intelligence has 
previously been demonstrated (8), the effect 
of statistical regression upon intelligence from 
one generation to the next may account for 
the lack of relationship in the present in- 
stance. The restriction of range from the 
predominance of agricultural occupations in 
this group may affect the results as well. As 
indicated, comparable data are not available 
for the urban sample. 

2. The result of relating the occupational 
classifications to the PDS total score indi- 
cates higher correlations in the rural group 
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Table 1 


Intercorrelations of Occupational Rating Scales With Other Variables for a Rural and Urban Sampk 


(N (rural) = 156; N (urban) = 155 


Fathers’ 
Education 
10 


Rural Urban Rural 


Minnesota Scale of 

Parental Occupations 

(MOS) O5 
Beilin Modification of 

Warner Scale (BOS) 09 
Personal Data Sheet 

Total Score (PDS) 44 
Material Score 32 
Cultural Score 36 
Fathers’ Education 13 


than in the urban and higher correlation for 
the BOS than the MOS. The reason for this 
may lie in the fact that the BOS more ade- 
quately differentiates among agricultural oc- 
cupations, which, from face examination of 
the two scales, would appear to be true. 
(The MOS classifies agricultural occupations 


in three groups, the BOS in seven.) 
It may well be that the criterion of pos- 


session of material and cultural items in 
the home no longer adequately differentiates 
among socioeconomic groups and so accounts 
for the low correlations that were generally 
obtained. The rise in the economic level of 
most Classes with the attendant improvement 
in standard of living has placed most of 
the materials included in such scales within 
the reach of most householders (e.g., indoor 
plumbing, radios, etc.). The reason, how- 
ever, for the higher correlation of the ma- 
terial scale and BOS for the rural group re- 
sults possibly from the slight lag between 
rural and urban populations in the acquisition 
of these items. The fact that the correlation 
of the material to the cultural scale is higher 
for the rural (.62) than urban group (.33) 
would be interpreted, too, as indicating that 
there tends to be greater consistency with re- 
gard to the possession of material and cul- 
tural items within socioeconomic classes in 
the rural than the urban group. Although 
there may be greater usefulness of material- 
cultural scales in rural areas, it is doubtful 


Cultural 
Score 


Urban Rural 


Material 
Score 


PDS Total 
Score 


BOS 


Urban Rural Urban Rural Urban Rural 


whether the extent of differentiation achieved 
by them will warrant their use. The con- 
tinuing changes in rural life make it even 
more doubtful. 

3. Both occupational classifications of fa- 
ther’s occupation (MOS and BOS) correlate 
about equally with father’s education. The 
cultural scale of the PDS correlates more 
highly with father’s education than does the 
material scale, as would be expected. 

4. The PDS total score has substantial cor- 
relation with father’s education (.52~-.51) 
and IQ (.44). It would appear from this 
that the composite material-cultural score is 
more highly related to education and intelli- 
gence than is true of the occupational rating 
scales. 

5. The MOS and BOS correlate .67 
.70 (urban and rural) with each other. 

It may be relevant at this point to note 
that this study, which was designed to vali- 
date two occupational scales employing ma- 
terial and cultural possessions as criteria, has 
led to some questioning of the adequacy of 
the criteria. The foregoing is based upon the 
assumption that occupation still does differ- 
entiate. One can cite in justification the 
Warner finding (15) that the zero-order cor- 
relation between occupation and_ social-class 
participation was .91. The data reported 
here suggest that both urban and rural fami 
lies are no longer adequately differentiated in 
terms of socioeconomic status, according to 


and 
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the possession of items traditionally included 
in the kind of scale employed in the present 
study. In our present economy, for example, 
refrigerators appear in 96% of the nations’ 
wired homes, 86% have electric washers, 66% 
vacuum cleaners and 75% own automobiles.” 
It may, of course, be that new material and 
culture goods have entered the economy since 
the 1930's which would adequately differenti- 
ate among the various socioeconomic groups 
in spite of the general economic and probably 
cultural upgrading of the population. ‘Tele- 
vision sets would not be one such item, how- 
ever, since 81% of wired homes possess them. 
Whether others do remains to be investigated. 


Summary 


Validating the Warner and Minnesota type 
scales against cultural and material possession 
criteria indicates substantial but not high cor- 
relation. This is explained on the basis that 
material possessions and to some extent cul- 
tural items (of the kind indicated) are so 
widely distributed in the population as to no 
longer act as adequate socioeconomic differ- 
entiators. There is some indication that the 
modification of the Warner scale used here 
more adequately differentiates among agricul- 
tural occupations than the Minnesota scale. 

A scale based upon possession of household 
and cultural items appears to be more highly 
related to intellectual and educational factors 
than is occupation as measured by the Min- 
nesota and a modified Warner scale. It 
would appear from inspection that the Min- 
nesota scale is in need of revision to make the 
classification of occupations more consistent 
with the nature of contemporary occupations. 
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Manual Dexterity of the Gloved and Bare Hand as a 
Function of the Ambient Temperature and 
Duration of Exposure ' 


Leonard S. Rubin * 


Chemical Warfare Laboratories 


In numerous industries and throughout the 
armed forces, men are required to wear gloves 
for protection against environmental hazards, 
e.g., temperature extremes, toxic materials, 
etc., while performing tasks that require con- 
siderable manual dexterity. While the effects 
of numerous variables on dexterity of the 
bare hand have been made, few studies have 
considered the gloved hand. This study is 
concerned with a relative evaluation of the 
dexterity afforded by several gloves when 
worn for varying periods of time under vari- 
ous ambient temperatures. 

It is well known that the extremities are 
most subject to heat loss because of their 
large surface area relative to their mass (8). 
A few objective studies have shown that dex- 
terity of the bare hand is impaired at low 
temperatures. Bartlett and Gronow (2) have 
shown significant impairment of the bare hand 
when Ss were subjected to 14° F. Perform- 
ance deteriorated as the exposure period in- 
creased from one-half to one and one-half 
hours. Mills (5) found that dexterity of the 
bare hand was impaired significantly after a 
half hour at 24° to 14° F and more quickly 
at lower temperatures. Furthermore, he found 
that the extent of impairment was positively 
correlated with impairment of the tactile sen- 
sitivity of the index finger tip. 

Hunter et al. (4) have shown that the 
joints of an extremity undergo significant 
temperature changes when exposed to cold. 
If kinesthetic receptors of importance are lo- 
cated in the interphalangeal joints, and their 
thresholds are raised during exposure to cold, 
then it may be deduced that the sensory in- 
formation required for accurate control of the 
fingers would be absent. 

‘This study could not have been performed with 
out the technical assistance of A. Karasik, A. Wellner 
and F. Rosenberg 


2Now at Eastern 
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In addition to the sensory impairment oc- 
casioned by cold, the quietness, smoothness 
and strength of finger movements is affected 
by an increased viscosity of the synovial fluid. 

The numerous factors which affect the dex- 
terity of the cold bare hand would also be 
present in the situation where the gloved 
hand is considered. In addition, in evaluat- 
ing different gloves it is conceivable that the 
differences in physical properties would dif- 
ferentially affect heat loss and concomitant 
variation in tactile, kinesthetic sensitivity as 
well as motor responsiveness. The insulating 
properties of a glove affect sweat production 
which provides a fluid environment for the 
hand at higher temperatures—-which may also 
impair dexterity. Finally, those physical 
properties that determine flexibility and re- 
siliency will differentially affect the dexterity 
of the gloved hand. The purpose of this ex- 
periment was to evaluate the dexterity af- 
forded by two commercially available neo 
prene gloves that provide protection against 
chemical injury to the skin under conditions 
that could prevail when the glove was worn. 


Apparatus and Procedure 


Dexterity tasks. In order to investigate the effect 
of the experimental variables on manual dexterity, 
it was necessary to select tests which would measure 
it. Seashore, Jerome, and Harney, in an unpublished 
study, used 20 common manual skills and found that 
the skills employed did not correlate significantly 
with each other (see 6, p. 1351). Teichner et al. 
(7), for example, have stated that there is no gen- 
eral test of manual dexterity; rather, individual tests‘ 
evaluate specific dexterities 

In view of this lack of a single valid measure, sev- 
eral dexterity tasks were employed which it was 
thought were sufficiently reliable and sensitive to 
variations in dexterity which might be occasioned by 
the experimental conditions. Two tests were modeled 
after those employed by Bartlett and Gronow of the 
RAF Institute of Aviation Medicine in 1952 
tests which were not 
each other were the 
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significantly 
bolt” test 


correlated 


nut and ind 
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“screwdriver” test (r=033, P=010). Another 
test required the S to pick up washers of different 
diameter and varying thickness. The fourth task, 
introduced for the purpose of approximating face 
validity, required that S manipulate the contents of 
a Chemical Corps item, the E28 Chemical Warfare 
Agent Detector Kit. 

As the nut and bolt test (Task B) and screwdriver 
test (Task C) have been described and illustrated 
previously by Bartlett and Gronow, only the washer 
test (Task A) and the military task (Task D) will 
be described. 

The washer test (Task A) consisted of a flat metal 
plate on which was arranged a series of washers of 
varying thickness and diameter. The washers were 
constructed in three diameters and five thicknesses 
for each diameter. The diameters were 1.0, 1.25, 
and 1.75 in. and the thicknesses were .050, .075, .100, 
125, and .150 in. Starting with the smallest, thin- 
nest washer, the S was required to pick up each of 
the washers and to place them on wooden pegs 
There was one peg for each of the three diameters 
of 1/8, 2/8, and 3/8 in. (see Fig. 1). The S then 
was required to pick up the washers, one at time, 
and put them back in their former positions. The 
task was considered complete and the time required 
for completion was recorded in seconds when the 
last washer was put back in place on the board. 

Task D required that S manipulate the contents of 
the E28 CW agent detector kit in accordance with 
the procedure normally employed to detect the pres 
ence of sundry chemicals. The chemical detection 
kit consists of two rubber squeeze bulbs, a number 
of detection tubes sealed in lead foil, and reagent 
bottles with droppers. 

The S’s task was to go through the same manual 
procedures that woyld ordinarily be followed in the 
field. First the kit is opened and one rubber bulb 
is removed; then one foil-wrapped detector tube is 
removed. The foil is stripped from the tube, and 
the tube is inserted into the rubber receptacle at the 
end of the rubber bulb. Then with the bulb in 
palm, it is squeezed six times. The tube is then re- 
moved from the bulb and both are put down on a 
table. Next the reagent bottle is removed from the 
kit and opened. Then the: eyedropper is filled and 
four drops of liquid are run into a tube. The tube 


“Washer Test.” 
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is then set aside and the kit is repacked by putting 
the rubber bulb and the reagent bottle back in place 

All tests had to be performed as quickly as pos- 
sible and the time required for completion was re- 
corded in seconds by a stopwatch. 

Hand conditions. The above enumerated tasks 
were presented to Ss who were required to manipu- 
late them as quickly as possible while they were per- 
forming under one of the five glove conditions. Still 
another group performed all of the tasks with the 
bare hand. The five glove conditions of the experi- 
ment were: (a) Goodrich glove, a molded neoprene 
latex glove with a modulus (tensile-stress) of 507 
lb./sq. in., an ultimate elongation of 736%, and a 
tensile strength of 2,195 lb./sq. in.; (b) Stanzoil 
glove, a neoprene latex glove containing a layer of 
nylon with a modulus of 646 Ib./sq. in., in ultimate 
elongation of 673% and a tensile strength of 3,025 
Ilb./sq. in.; (c) Cotton glove; (d) Goodrich glove 
with cotton glove as liner; (e) Stanzoil glove with 
the cotton glove as liner. 

Temperature. Four ambient temperature condi- 
tions were employed in the experiment, 25°, 50°, 75°, 
and 100° F. These temperatures were maintained at 
* 2° F in the climatic facility of the Chemical Corps 
Medical Laboratories (1). 

Seventy-two male volunteers from the Army 
Chemical Center were employed as Ss. Not only 
were they exposed to the temperature conditions of 
the experiment, but they were required to remain in 
the facility for varying amounts of time, e.g., 40, 80, 
120 minutes before they were tested on each of the 
four dexterity tasks. They were comfortably and 
appropriately clad for each temperature. Before the 
experiment was begun, each S received two familiari- 
zation trials of five minutes duration when the tasks 
were manipulated with the bare hand. In addition, 
several anthropometric measurements were obtained 
for the right hand of each S, e.g., inside hand breadth 
at metacarpal; length of hand, volar, from navicular; 
length of middle finger, volar; girth of wrist; thick- 
ness of metacarpal III; length of phalanx I, middle 
finger. A thermocouple was placed on the tip of the 
middle finger of each hand during each experimental 
trial so that skin temperature measurements could 
be recorded continuously without impeding the 
movement of the hands or fingers. 

Each glove condition and the bare hand were 
tested under every temperature and duration of ex- 
posure on 12 Ss who had been assigned at random 
to the conditions. Each group of 12 Ss under each 
of the hand conditions was divided into three sub- 
groups. Each subgroup was required to remain in 
the facility for either 40, 80 or 120 minutes. Each 
S within a subgroup received four trials in the cli- 
matic facility under a different temperature condi- 
tion and performed a different dexterity task on 
each trial 


Results 


Dexterity as a function of the tasks. In 
that all Ss received all tasks in a randomized 
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Table 1 
Correlation Matrix Over All Glove Conditions 
(Pearson r) 


Tasks 
A BR ( D 
A 40°* 40** Bb 
B 36** 27 
e .22 


** Significant at}1%, level. 


order, product-moment coefficients of corre- 
lation were obtained between tasks and tested 
for significance in order to ascertain the ex- 
tent to which the tasks were measuring some 
common factor. Correlation matrices be- 
tween tasks were obtained for all five glove 
conditions and for the bare hand. 

The correlation matrix in Table 1 was ob 
tained by correlating the logarithmic trans- 
formation of the time scores (secs.) obtained 
on each task by each of the 60 Ss regardless 
of the particular glove condition to which 
each of the five groups had been subjected. 
The major feature of the data is the existence 
of significant correlation among all of the 
laboratory tasks employed to measure dex- 
terity (Tasks A, B, and C) and the lack of 
a significant correlation between the one mili- 
tary task included in the battery for the pur- 
pose of face validity and any of the standard 
laboratory tasks. 

Table 2 was obtained by correlating the 
logarthmic transformation of the time scores 
(secs.) obtained on each of the four tasks by 
the 12 Ss who used the bare hand in manipu- 
lating the objects. For 11 df, an r = .68 is 
required for significance at the 1% level. It 
is apparent that no correlation coefficient 
within the matrix obtained for the bare hand 
is significant. 

The correlations obtained with the bare 
hand are commensurate with the results ob- 
tained previously by many experimenters in 
that the low, insignificant correlation suggests 
that the tasks employed to measure dexterity 
are independent and presumably measure dex- 
terities. However, the results obtained with 
the gloved hand are unusual. The finding 
that the military task employed did not cor- 
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relate significantly with any of the labora- 
tory tasks raises the question of the validity 
of any approach that employs laboratory tasks 
solely in an effort to measure the effect of 
sundry variables on dexterity as involved in 
the performance of military or industrial tasks 
when the hand is gloved. 

Dexterity as a function of the experimental 
variables. Since the average time (secs.) to 
perform the Washer Board Test, the Nut and 
Bolt Test, the Screwdriver Test, and the Mili- 
tary Task over all other conditions was ap- 
proximately proportional to 1.00, 1.06, 1.31, 
and 1.11, these factors were used to divide 
times of performance and so make the influ- 
ence of the tests comparable to each other. 
This was necessary in order to apply the 
analysis of variance technique subsequently 
carried out. 

The data analyzed are in logarithmic form 
because it was found that the were 
proportional to the variances. The logarith- 
mic transformation made the means independ- 
ent of the variances as determined by Bart- 
lett’s test which yielded a chi square equal to 
8.2030 P > 0.95 for 17 df. 

A three-variable factorial analysis (four 
replicates) was conducted for hand condi- 
tions, temperatures, and durations, and the 
results are shown in Table 3. The only ma- 
jor significant difference observed was_ be- 
tween all hand conditions. 

The least significant difference required at 
the 1% level between mean log time (secs.) 
for hand conditions is 0.1133. Table 4 pre- 
sents a matrix of these means for each hand 
condition and the level of significance of the 
differences. 

Manipulation of the various tasks by the 


means 


Table 2 


Correlation Matrix for the Bare Hand 
(Pearson r) 


‘Tasks 
\ B ( D 
A 16 03 37 
B 46 29 


O% 
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Table 3 


Summary Analysis of Variance for Hand Conditions, Durations, and Temperatures 


Source 


Hand Conditions (H) 

Temperatures (T) 

Durations (D) 

Hand conditions % temperatures 

Hand conditions X durations 

‘Temperatures durations 

Hand conditions *X temperatures * durations 
Residual 


Total 


*** Significant at the .OO1 level 
* No significant difference 


bare hand was significantly superior to any 
other hand condition with respect to the time 
required to complete the tasks. When the 
close fitting cotton liner was worn, the Ss 
performed significantly faster as compared to 
any of the conditions in which a glove was 
worn but significantly slower than the bare 
hand. <A comparison with the bare hand 
demonstrates’ a 10.7% impairment of func- 
tion (see Table 5) occasioned by wearing the 
cotton liner... Performance on the tasks was 
not significantly differentiable for the condi- 
tion in which Ss wore either a Goodrich glove, 
a Stanzoil glove or the Goodrich glove in com- 
bination with the liner. The degree of im- 
pairment occasioned by these hand conditions 
was 22.1%, 24.9%, and 25.2‘, respectively. 
The Goodrich glove above, however, permitted 
the S to work more rapidly than when he was 
wearing the Stanzoil glove in combination 
with the liner. When the Goodrich glove was 


Table 4 


Difference in Mean Log Latency (secs. ) 
Between Hand Conditions 


Goodrich Stanzoil 
and and 
Liner Stanzoil Liner 


Liner Goodrich 


Bare Hand is4or* 
Liner 
Goodrich 
Goodrich and 

Liner Ooo! 
Stanzoil 


yo1r6e** 
070°" 


4 201** 
2451%* 
0375 


4352** 
2512%* 
0446 


5320** 
480" 
1410°* 


1035 
Oo74 


** Critical difference required for significance at 1% level is 


Oss 


SS MS 


9.4013 
0.2813 
0.0131 
0.4445 
0.1379 
0.2065 
0.9997 
10.2184 


1.8803 
0.0938 
0.0065 
0.0296 
0.0138 
0.0344 
0.0333 
0.0473 


hE ae 


1.984 


21.7027 


worn with the liner, performance on tasks was 
about equal to performance with the Stanzoil 
glove and liner combination. Finally, no sig- 
nificant difference in time required to com- 
plete the tasks existed between the Stanzoil 
condition and the Stanzoil and liner combina- 
tion. 

A summary of the breakdown of the sig- 
nificant effects indicates no significant differ- 
ence between the two gloves (Goodrich and 
Stanzoil) with or without liners. 

The analysis of the data also indicates that 
neither variations in ambient temperature be- 
tween 25° to 100° F nor the amount of time 


Table 5 


Mean Log Time (secs.) Required to Complete All 
Tasks Under Fach Hand Condition 


Mean of 

Log Time Impair 

Hand Condition (secs.) ment ® 
Goodrich 


Stanzoil 


2.1178 
2.1614 
2.1553 
2.2588 
1.9102 


22.1 
25.2 
24.9 
30.9 
10.7 


Goodrich and Liner 
Stanzoil and Liner 
Liner 


* To determine the relative amount of impairment produced 
by the gloves, the percentage increase in time required to per 
form on the tests relative to the bare hand was ascertained by 
the following calculation; 

G B 
%1 100 
, B 
where B 
condition, G 
condition, and I 


mean performance of the Ss under the bare hand 
mean performance of the Ss with a given glove 
relative increase in time to perform 
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Fic. 2. Mean skin temperature (°F) as a function 
of duration, ambient temperature and hand condi- 
tion. 


spent at these temperatures, 40 to 120 min- 
utes, significantly affected manual dexterity 
as measured by the tasks. Although this re- 
sult might be reasonably accepted for 75° 
and 100° F, a further analysis of the data 
was required to ascertain how low tempera- 
ture exposure for reasonably long periods of 
time failed to affect dexterity. In the analy- 
sis that followed, the mean skin temperature 
was described as a function of the time spent 
at each of the four temperatures for the bare 
hand and for the over-all glove conditions. For 
each of the ambient temperatures,® the mean 
skin temperature (°F) was obtained over the 
course of 120 minutes for the bare hand and 
over-all glove conditions. Each mean skin 
temperature value represented in Fig. 2 for 
the bare hand was obtained from four Ss who 
spent 120 minutes in the climatic facility at 
each temperature. The mean skin tempera- 
tures for over-all glove conditions in the same 
figure were obtained from 20 Ss who also had 
been exposed for 120 minutes under each 
temperature condition. 

The data obtained at an ambient tempera- 
ture of 25° F indicates that the mean tem- 
perature over-all glove conditions fell from 
about 70° F to 51° F in two hours. The 
bare hand data indicates that a temperature 
of about 50° F is necessary for the elicitation 

8 Equipment failure on one day precluded the 
measurement of skin temperature when the ambient 
temperature was 50° F. Consequently, the values in 


the figure for the bare hand and over-all glove con 
ditions are based upon 3 and 19 Ss, respectively 
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of the compensatory reflex vasodilation which 
resulted in an increase in skin temperature fol- 
lowed by a drop to 47° F when reflex vaso- 
dilation is again observed. The existing lit- 
erature indicates that significant impairment 
of dexterity with the bare hand becomes mani- 
fest after exposure for at least one-half hour 
to temperatures between 14°—20° F, a condi- 
tion not employed in this experiment. It can 
be conjectured that slightly higher tempera- 
tures acting over a longer period of time 
would produce the same result. The data 
obtained in the present experiment for the 
bare hand at an ambient temperature of 25° F 
suggests that a critical skin temperature was 
being approached as evidenced by the reflex 
vasodilation when skin temperature dropped 
to 50°-48° F. The skin temperature for the 
over-all glove condition attained this value at 
the termination of the experiment. We may 
conclude that the gloved hand in our experi- 
ment afforded sufficient protection against loss 
of heat so that the adequate stimulus required 
for elicitation of compensatory 
dilation never prevailed. 

Anthropometric considerations. Numerous 
measurements were made of the hand of each 
S employed in the experiment in an effort to 
relate the obtained physical dimensions to the 
values obtained by Wright Air Development 
Center (3) personnel who derived their values 
from a sample of 4,000 recruits. These com- 
parisons enabled us to determine whether our 
sample of hand size dimensions was compa- 
rable to those obtained from a much larger 
sample. 

An examination of Table 6 reveals that 
there are no significant differences between the 
means nor the variances of the sample used 
in the study and the WADC report. This, 
therefore, permits greater generalizability of 
the results of this study. 


reflex vaso- 


Discussion 


In developing handgear for industrial and 


military purposes, design engineers are con 
cerned with the problem of affording ade- 
quate protection against toxic materials or 
environmental extremes and minimal inter- 
ference with the ability to perform. skilled 
manual tasks 
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Table 6 


Comparison of the Study Sample to the WADC Sample on Several Anthropometric Hand Measurements 


WADC (n = 4000) 


Mean 


(inches) 


Standard 
Deviation 


O16 
0.34 


3.49 
7.49 
3.25 
6.85 
1.17 
2.67 
0.86 


0.40 
0.07 
0.12 
0.05 


Girth of wrist 


The psychologist is frequently requested to 
evaluate handwear from the standpoint of 
dexterity, and he tends to rely upon stand- 
ard laboratory measures of the dexterities. 
In this study it was found that the dexterity 
tasks employed with the bare hand yielded 
intercorrelations that were commensurate with 
the results obtained previously by many ex- 
perimenters who found low insignificant cor- 
relations which suggested that the tasks em- 
ployed to measure dexterity were independent. 
However, the results obtained with the gloved 
hand require some discussion, because the 
military task employed did not correlate sig- 
nificantly with any of the laboratory tasks. 
This raises the question of the validity of an 
approach that employs laboratory tasks solely 
in an effort to measure the effect of sundry 
variables on dexterity as involved in the per- 


formance of military tasks with a gloved — 


hand. If the only useful validity criterion 
of a test is that it predict performance or cor- 
relate significantly with performance on a 
criterion task, and if we accept the manipula- 
tion of the contents of the E28 CW agent de- 
tector kit as a representative criterion task, 
then we must conclude that our laboratory 
tests were not useful validity criteria. This 
conclusion suggests that future evaluative 
studies on dexterity should employ tasks that 
possess considerably more face validity. An- 
other alternative is suggested by an_ inter- 
pretation of the results which would require 
that laboratory tests be based upon manipula- 
tive skills actually derived from an activity 


Dimension Measured 


Inside hand breadth at metacarpal 
Length of hand volar from navicular 
Length of middle finger, volar 


Thickness at metacarpal III 
Length of phalanx I, middle finger 


Cml C (nm = 72) 


Mean 
(inches) 


Standard 
Deviation 


0.18 
0.43 
0.22 
0.34 
0.11 
0.17 


3.45 
7.71 
3.33 
7.04 
1.36 
2.63 


Diameter middle finger 


analysis of users employing the equipment. 
If we consider manipulative skill as consist- 
ing of a large but finite population of dex- 
terities varying in the degree of complexity of 
motor responses and tactual kinesthetic sen- 
sory integration required, then we might gen- 
erate a hypothetical continuum that is nor- 
mally distributed from one extreme consisting 
of fine motor responses and complex tactual 
kinesthetic sensory integration to the other 
extreme of gross muscle movements and a re- 
duced requirement for tactual kinesthetic in- 
formation. With this paradigm in mind, one 
could posit that when the bare hand was em- 
ployed, although the four tasks were inde- 
pendent, still, the three laboratory tasks which 
required fine muscle movements and complex 
tactual kinesthetic sensory integration were 
at one end of the continuum and the military 
task which required gross responses and de- 
pended upon fewer tactual kinesthetic cues 
was at the other end of the continuum. Un- 
der the condition in which a glove was worn, 
it can be assumed that tactual kinesthetic in- 
formation was reduced and that the range of 
response patterns was constricted by the en- 
cumbrance to eliminate the differences that 
existed among the laboratory tasks when the 
bare hand was employed. However, the re- 
duction in sensory information and constric- 
tion of responses occasioned by wearing of 
the glove did not significantly impair the 
manipulation of the components of the mili- 
tary task which merely required gross move- 
ment and little tactual kinesthetic informa- 
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tion. Unfortunately, the design of the experi- 
ment did not permit a direct test of this 
hypothesis. 

Conclusions 


The following conclusions may be drawn 
from the results of the experiment: 


1. No significant correlations were obtained 
between any of the dexterity tasks when the 
bare hand was employed. 

2. Significant correlations were obtained 
between all of the laboratory dexterity tasks, 
but no laboratory task correlated significantly 
with the military task when the hand was 
gloved. 

3. Manual dexterity was significantly af- 
fected by the hand conditions of the experi- 
ment although the temperature conditions 
(25°, 50°, 75°, and 100° F) and the dura- 
tions of exposure (40, 80, 120 minutes) had 
no significant effect on dexterity. 

4. No significant difference obtained be- 
tween the two gloves (Goodrich and Stanzoil) 
evaluated with or without cotton liners. 

5. At an ambient temperature of 25° F, 40 
minutes were required to reduce the skin tem- 
perature of the bare hand to 50° F at which 
time a compensatory reflex vasodilation was 
stimulated. 


6. The hand measurements of the sample 
studied corresponded closely to hand meas- 


383 


urements made on 4,000 individuals by Air 
Force personnel. 
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Four Techniques of Group Decision Making Under 
Uncertainty * * 


Robert C. Ziller 


Fels Group Dynamics Center, University of Delaware 


Military aircrews frequently are compelled 
by circumstances to make decisions in situa- 
tions involving uncertainty and risk. For ex- 
ample, should the crew bail out or attempt to 
crash-land? Abort or continue the mission? 
Remain with the crippled aircraft and await 
rescue, or attempt to reach a distant Eskimo 
village? This study was designed to explore 
the group members’ reactions to four tech- 
niques of decision making in a mock situa- 
tion of this kind. 

Techniques of group decision making may 
be differentiated according to the degree to 
which the activity is group centered or leader 
centered. In the group-centered techniques, 
the decision evolves primarily from interac- 
tion of the group members; in the leader- 
centered techniques, the leader is largely re- 
sponsible for the decision. In general, it may 
be hypothesized that group members respond 
more positively to the decision and the de- 
cision-making processes under conditions of 
“self-determination,” that is, under the group- 
centered rather than the leader-centered tech- 
niques. 

However, the extent to which the principle 
of self-determination may be generalized re- 
mains in doubt. For example, with regard to 
business groups, Berkowitz (2) states that 
satisfaction with the decision-making meet- 
ing may be low if the leader does not adopt 
a role consistent with the group’s expecta- 
tions. With reference to the group decision- 
making procedures of military aircrews, it 
seems unlikely that group members expect 
the leader to refrain from attempting to influ- 


! This is an extension of a paper presented at the 
American Psychological Association, New York City, 
September, 1954. 

‘This work was accomplished in 1954 when the 
author was a member of the staff of the Crew Re 
search Laboratory, Air Force Personnel and Train- 
ing Research Center, Randolph Air Force Base, 
Texas. The views expressed in the report do not 
necessarily represent those of the United States Air 
Force. 


ence the group. The leader is one of the 
pilots of the plane who has extensive control 
over the lives of the crew members by mili- 
tary law. Thus, the leader occupies a posi- 
tion of power approached by no other crew 
member, and the members characteristically 
look to him for support and assurance. 

In this study an attempt is made to recon- 
cile these viewpoints by investigating the de- 
gree of self-determination most acceptable to 
military decision-making groups under condi- 
tions of uncertainty. 


Method 
Setting 


The experiment was conducted at the Strategic Air 
Command’s Advanced Survival School, Stead Air 
Force Base, Reno, Nevada. Each crew was housed 
in a separate tent which served as the testing’ area 
Subjects 

The Ss of the experiment were 45 B-29, B-50, and 
B-36 aircrews comprising approximately 500 men 


The crews ranged in size from 8 to 13 men; but, for 
the most part, they were 10- and 11-man crews 


The Decision-Making Situation 


The decision-making incident was adapted from 
experiences of several groups in survival situations 
The following problem was read twice by E to the 
group for their consideration and decision. 


A bomber crew was downed over Norway dur 
ing the winter of 1944. With the help of the un 
derground, radio contact had been established with 
friendly forces and a submarine had been dis- 
patched to pick them up at a given time and place 
off the coast. In order to insure the safety of the 
submarine and because of the danger of being 
spotted on the coast, the crew delayed its dash to 
the coast as long as they dared to. As the crew 
headed for the pick-up point, they became aware 
that enemy troops were on their trail. The crew 
had reason to believe that the pursuers were less 
than a day’s distance behind. At this point, the 
crew arrived at a fiord which was about four 
miles wide and on the other side of which was 
the pick-up point. The fiord was covered with ice 
but because of the snow covering it, it was impos 


sible to tell how thick it was. Furthermore, no 
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one in the group knew the characteristics of fiord 
ice at that time of the year. The distance around 
the fiord to the pick-up point was about 15 miles 
of difficult terrain. Looking around, the group 
saw a dwelling about eight miles away at the 
most inland point of the fiord. Given only these 
conditions, what would your decision have been? 


While the task is ostensibly artificial, still the 
problem possessed “face validity” since the study 
was conducted during the Korean War at an Air 
Force school designed to teach bomber crews to 
survive in the event of being downed anywhere in 
the world. we 


Experimental Procedure 


Essentially, the experimental procedure involved 
the following steps: (a) group orientation to the 
study; (6b) completion of a buffer questionnaire; 
(c) private orientation of the leader; (d) presenta- 
tion of the decision-making incident; (¢) group de- 
cision; and (f) completion of the decision-making 
questionnaire. 

While the crew members were occupied with a 
seemingly relevant questionnaire (buffer question- 
naire), the leader was requested privately to leave 
the tent with the E. The experiment then was de 
scribed in greater detail and the leader was asked to 
adopt a randomly assigned decision-making tech- 
nique. The four techniques studied may be called 
(a) authoritarian, (b) leader suggestion, (c) census, 
and (d) chairman. In these techniques, opportuni- 
ties for reinforcement from the leader and the group 
were varied. Under the authoritarian experimental 
condition, the leader alone was responsible for the 
decision and, therefore, for the most part was the 
sole source of support to the group members. Un- 
der the leader-suggestion condition, the responsibility 
was shared to a degree by the leader and the group 
members although the leader was in a position to 
exercise greater influence. Under the census condi 
tion, the responsibility was again shared, but, in 
comparison with the leader-suggestion condition, the 
group’s influence potential was increased while the 
leader’s influence potential was decreased. Finally, 
under the chairman condition, the responsibility for 
the decision rested almost entirely upon the group 
exclusive of the leader who acted as the committce 
chairman. The techniques may best be described by 
the directions given the leader. 

Authoritarian. The AC’s (Aircraft 
decision is the crew's decision. There is no discus- 
sion of the problem among the crew members 
After the problem is read to the group by the &, 
the AC takes as much time as is necessary up to a 
maximum of 15 minutes to consider the problem 
silently, then announces the decision to the crew 
The AC will indicate to the E that he is prepared 
to submit the decision by saying that he believes 
that everyone has had sufficient time to consider the 
problem. Then, the decision is submitted by the 
leader without explanation or discussion 


Commander ) 
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Leader Suggestion. The AC considers the prob- 
lem silently for not more than five minutes, then 
opens the crew discussion by stating his opinion as 
to the best decision and asks the crew members what 
their ideas are. The group has a total of 15 minutes 
to reach a decision, after the time the discussion is 
opened by the leader. Following this group discus- 
sion, the AC submits the final decision. 

Census. The crew discusses the problem immedi- 
ately with the AC acting as chairman or discussion 
leader but deferring the expression of his own opin- 
ion until well along in the discussion or at least 
until it appears expedient to do so. After further 
discussion, the AC submits the final decision. As dis 
cussion leader,® it will be the function of the AC to: 

1. Help clarify the problem 

2. Maintain order. 

3. Keep everyone involved in the discussion 

4. See to it that everyone's opinion is given ade- 
quate consideration. 

5. Attempt to clarify the positions or ideas of dif 
ferent contributors by restating their proposal or ask- 
ing questions which will lead the individuals to 
clarify their positions 

6. Keep the group goal oriented 

Chairman. ‘This is the same as the census tech- 
nique, except that at no time does the AC express 
his own opinion as to the correct solution or take 
sides in any way. He must remain neutral through- 
out the discussion while acting as the discussion 
leader. The decision must evolve from the group 
It is the decision of the group members excluding 
the chairman. 

Thirteen crews were assigned randomly to the au- 
thoritarian technique, 12 to the leader-suggestion 
technique, 10 to the census technique, and 10 to the 
chairman technique. 

Under the authoritarian technique, the crew mem 
bers remained in their bunks throughout the prob 
lem. During the discussion period under the other 
experimental conditions, the crew members were seated 
on foot lockers arranged in a crude rectangular 
pattern, 

Following the decision-making session, the crew 
members completed a questionnaire designed to de 
termine their degree of satisfaction with the decision, 
the method of decision making, their own participa- 
tion, and their opinion as to the difficulty of the task 


Measures of the Crew Members’ Reactions 


The questions designed to measure the crew mem- 
bers’ reactions to the group decision and the decision- 
making process were largely adopted from the Con 
ference Research Studies of business groups as re 
ported by Berkowitz (2) 
were as follows 


The questionnaire items 


8 Most aircraft commanders had either attended 
an Air Force school where conference leadership tech 
niques were taught or were otherwise familiar with 
the technique. 
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1. How well satisfied were you with the decision 
reached ? 

2. How closely did you personally agree with the 
decision reached ? 

3. How well satisfied were you with the method 
the crew used in reaching the decision? 

4. How would you compare the quality or ac- 
curacy of the decision reached by your crew to that 
of other crews using any other method? 

5. Sometimes in a meeting a person wants to talk 
but feels, somehow, that it would be better not to 
say anything. To what extent did you feel this way? 

6. Consider the entire decision-making session: 
“Everyone’s opinion was given adequate considera- 
tion.” 

7. Consider the entire decision-making session: 
“My opinion was given adequate consideration.” 

8. To what extent do you feel that the partici- 
pants worked as a unified group rather than as a 
disjointed collection of individuals? 

9. How difficult was the problem? 

The nine items were arranged according to con- 
tent: Items 1 and 2 pertain to satisfaction with the 
decision; Items 3 and 4 to satisfaction with the de- 
cision-making technique; Items 5, 6, and 7 to satis- 
faction with participation during the discussion; 
Item 8 to satisfaction with the organization of the 
group during the discussion; and Item 9 to the diffi- 
culty of the problem. 

The group members’ responses were measured on 
an 11-point scale (0 to 10). On all items a rating of 
10 represented the most favorable reaction. Each 
item was retained as a separate criterion when it was 
established through correlation techniques that the 
responses to the various items were relatively inde 
pendent, 


Results and Discussion 


Since the nature of the decision may have 
been operating as an intervening variable be- 
tween the decision-making method and group 
satisfaction, it was necessary to determine 
whether or not there was a relationship be- 
tween the nature of the decision and group 


satisfaction before a test of the relation- 
ship between the decision-making method 
and group satisfaction would be meaningful. 
However, only the authoritarian and leader- 
suggestion conditions resulted in a near even 
split between decisions. Thus, a test oi the 
relationship between the nature of the deci- 
sion and group satisfaction was possible only 
with reference to these experimental condi- 
tions. Moreover, the test was conducted only 
with regard to items 1, 2, and 9; those rele- 
vant to and available for both treatments.‘ 


+A less conservative estimate of statistical signifi- 
cance was also calculated on which it was assumed 
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Concerning the authoritarian technique, the 
group making the decision to cross the ice re- 
sponded more positively on all three items (t¢ 
= 3.59, 3.73, and 2.55, respectively; p .05 in 
all cases). With regard to the leader-sugges- 
tion technique, there was a trend in the same 
direction but the results were not statistically 
significant. 

This suggests that in a conflict situation in- 
volving risk, a group discussion in contrast to 
the authoritarian approach tends to lead to 
greater understanding of the alternatives and 
their consequences and, subsequently, tends 
to result in relatively equal satisfaction with 
whichever alternative the group selects. This 
also indicates that the nature of the decision 
must be controlled when analyzing the rela- 
tion among methods and satisfaction. Ac- 
cordingly, only the groups which had decided 
to cross the ice were included in the analysis. 
However, this necessarily limits the degree of 
generalizability of the results since this de- 
cision involves a greater degree of personal 
risk (we will return to this point). 

Analyses of variance of the groups’ ratings 
were made for each of the nine questionnaire 
items. A group rating represents the mean 
of the responses of the members of a crew.° 
The results are shown in Table 1. 

Only the results with regard to Items 3 are 
statistically significant. The least satisfac- 
tion with the method was expressed by groups 
using the authoritarian technique. The most 
satisfaction was expressed by members using 
the census technique. 

Another finding which is suggestive, al- 
though not statistically significant, has refer- 
ence to Item 9: “How difficult was the prob- 
lem?” Apparently, as the focus of the de- 
cision-making process shifts from the leader 
to the group—that is, moves from the au- 
thoritarian to the chairman technique—group 


that the individual crew members’ responses were 
independent. The resulting means (n= 36 to 45) 
were essentially the same as those reported in Table 1 
However, the analysis of variance tests were found 
to be significant with regard to Items 4, 7, and 9 as 
well as Item 3. 

® Several crews using the authoritarian method that 
arrived at the decision to go around the fiord did not 
complete their questionnaires properly. Consequently, 
the sample, with regard to Items 3 and 4, was in- 
adequate. 
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Table 1 
Mean Group Responses to Questionnaire Items Under 
Four Decision-Making Techniques Controlling 
for the Decision 


Decision-making Techniques 


Leader 
Authori- Sugges- Chair 
tarian* tion Census man 
Item (n=7) (n=6) (n=9) (n= 8) 
1. Sat-Dec 8.9 9.3 &.8 90 
2. Agree 8.6 9.0 8.9 8.6 
3. Sat-Meth* 69 8.6 9.2 8.5 
4. Accuracy 7.3 8.0 8.0 &.3 
5. Talk 7.2 7.3 7.2 
6. Opin-All : &.9 8.8 8.4 
7. Opin-My 94 8.8 8.8 
8. Unified 8.7 8.6 8.9 
9. Diffic 5.7 | 4.3 4.1 
*Items 5-8 did not apply to groups in this experimental 
condition, 
*Simple analysis of variance is significant at .05 level of 


confidence, 


members perceive greater problem difficulty. 
Conceivably, interaction increases when the 
group assumes greater responsibility for the 
decision. Through interaction, the complexi- 
ties of the problem may unfold in greater de- 
tail, thereby magnifying the differences be- 
tween conflicting opinions and increasing the 
difficulty of decision. (Note: these results 
corroborate the hypothesis derived from com- 
paring the relation between decision and satis- 
faction with regard to the authoritarian and 
leader-suggestion techniques. ) 

It is also interesting to observe that mem- 
bers using the leader-suggestion technique 
expressed the greatest degree of satisfaction 
with their participation during the discussion 
(Item 7). This may suggest that members 
of hierarchical groups are less constrained in 
a discussion when they are informed as to the 
leader’s position on the issue in question. 

While the conditions of the fiord-crossing 
problem were described in such a way that 
a point of indifference or conflict was ap- 
proached between decisions to cross or Cir- 
cumvent the ice, subsequent analysis revealed 
that the decision reached was apparently re- 
lated to the decision-making technique em- 
ployed. 

’ In order to test the reliability of this ob- 
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servation, the group decisions were dichoto- 
mized: (a) decisions to attempt to cross the 
ice and (6) decisions to go around the ice. 
The techniques, too, were dichotomized into 
leader-centered (authoritarian and leader-sug- 
gestion) and group-centered (census and 
chairman) categories. The frequency with 
which the leader- and group-centered crews 
reached the two different decisions was tabu- 
lated and chi-square analysis effected. The 
results are statistically significant and dem- 
onstrate that, when the leader was primarily 
responsible for the decision, the decision to 
go around the ice was submitted about 46% 
of the time. However, when the group mem- 
bers were primarily responsible for the de- 
cision, the decision to go around the ice 
was submitted only 15° of the time (See 
Table 2). 

In an attempt to interpret these findings 
the alternative decisions were re-examined. 
The decision to cross the ice entails the risk 
of one or more crew members’ falling through 
the ice and almost certain death, but less 
danger of the enemy’s detecting the sub- 
marine. The decision to circumvent the ice 
entails the risk of missing the rendezvous 
with the submarine and detection of the sub- 
marine if the pursuers should divine the desti- 
nation of the crew and decide to risk crossing 
the ice in an effort to intercept them. The 
critical difference between these decisions may 
be the greater personal risk to the crew mem- 
bers involved in the decision to cross the ice. 

The findings seem to suggest, then, that 
groups using leader-centered techniques of 
decision making (in which the leader has no 
knowledge of the group’s opinion prior to 
stating his own opinion) are more reluctant 


Table 2 
Focus of Responsibility for Making the Decision 
Related to the Nature of the 
Group Decision 


Group Decision 


Focus Go Around Go Across 
Leader 11 13 
Group 3 17 


Chi square = 4.78; p = .02-.05 








38% 


than group centered decision-making groups 
to make a decision involving a risk of the 
lives of the group members. Stated another 
way, the group has greater license to make a 
“risky” decision since it is their lives they are 
risking rather than the lives of others. 

However, interpretations of the results of 
this study are qualified by the extent to 
which the role randomly assigned to the leader 
was consistent with his customary leadership 
role and the effects of incongruence as well 
as congruence. It is quite conceivable, for 
example, that leaders who characteristically 
make decisions for the group before consult- 
ing the members and who were assigned the 
“authoritarian” decision-making techniques 
described in this study may have stimulated 
different group reactions than “democratic”’ 
leaders assigned the same technique. How- 
ever, because of the novelty of the task and 
the environment (a field rather than the usual 
flight surroundings), it might be argued that 
the group's reaction to the leader’s behavior 
would be situationally determined to a high 
degree. 


Summary 


The experiment was designed to explore the 


group members’ reactions to four techniques 
. Of decision-making under conditions of un- 
certainty and risk. Commanders of 45 air- 
crews comprising approximately 500 men 
were randomly selected to use one of four 
group decision-making techniques in a mock 
survival situation. The decision was _ re- 
corded by an observer and was followed by 
a questionnaire designed to measure the group 


Robert C. Ziller 


members’ reactions to the decision and the 
procedures employed. 

The findings and conclusions may be sum- 
marized as follows: 

1. In a conflict situation, when a group dis- 
cussion method of decision-making is involved, 
the members’ reactions to the alternatives are 
relatively undifferentiated in contrast to the 
condition in which the leader alone makes the 
decision for the group. 

2. While the results with regard to the as- 
sociation between the decision-making method 
and group satisfaction with the group proc- 
esses and products were generally inconclu- 
sive, the groups appear to be least favorably 
disposed toward the authoritarian technique 
of decision-making. However, as the focus of 
decision-making shifts from the leader to the 
group, group members perceive greater prob- 
lem difficulty. 

3. When the decision-making procedure is 
group centered rather than leader centered, 
the group reaches a decision involving greater 
personal risk to the members. 


Received January 24, 1957. 
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The present study is an attempt to de- 
velop noninstructor peer-oriented measure 
for evaluating medical students in their third 
and fourth years when traditionally their 
training becomes academic and more 
clinical. Peer nomination procedures focus 
attention on the individual and his ability to 
evaluate the effectiveness of other group mem- 
bers. These procedures should be distin- 
guished from the usual sociometric devices in 
which group dynamics and interpersonal rela- 
tions are of primary concern. A discussion 
by the author of the importance of this dis- 
tinction appears in a recent issue of the J/n- 
ternational Journal of Sociometry and So- 
ciatry (2). 

A review of peer nomination studies (4) 
has shown that peers, as unique observers of 
each other can make evaluations which are 
useful as criterion data. It is perhaps pos 
sible that in certain areas of education they 
may be used for the identification of those 
individuals who will be the most effective in 
a particular course or field of activity. 


less 


Rationale 


Rationale for using peer nominations in 
education research may be summarized as 
follows: 

1. Students have more time to observe each 
other than do their instructors. They are 
constantly in the position of comparing their 
own behavior with that of their classmates 
and presumably operate from a different set 
of values and frame of reference. 

2. Instructor contact with the student in 


1 This study is part of the author’s doctoral dis 
sertation completed at the University of Pittsburgh 
(3). Financial support of this research came from 
a research fellowship grant from the United States 
Public Heaith Service, National Institute of Mental 
Health. 

2A report of this research was presented at the 
1957 annual meetings of the American Psychological 
Association in New York 


the third and fourth years of medical school 
is brief. Each instructor sees more students 
but sees each of them for a shorter length o! 
time than do instructors in preclinic years 
Because of this, it is more difficult for them 
to draw fine lines of distinction concerning 
the student's clinical performance. 

3. Students know each other in a more in- 
formal socia] context than exists between in- 
structor and student and are able to observe 
each other more candidly and more compre- 
hensively. 


General Procedure 


The present study used nominations on the upper 
segment, ie., nominate three of the most effective in- 
dividuals on of 11 physician variables. The 
naval personnel study by Suci, Vallance, and Glick 
man (5) favors the segment approach in terms of 
reliability, simplicity, and lessened frustrating influ 
ence on raters 


each 


No negative nominations were asked 
for since preliminary tryouts with students in the 
“ethics conscious” 
little sympathy 
viduals 
Nominating questions were stated in the future 
tense so that students would feel they were rating 
potential: future behavior rather than the insecure 
present in which they have examinations, evaluation 
conterences, etc 


medical school situation indicated 


with nominating ineffective indi 


to face before graduating. It was 
felt that future nominations would be more accept 
able to the students. The naval studies, previously 
cited (5), found that the peer methodology least af 
fected by academic performance was the form con 
cerned with “future officer” performance 

The peer questions were related to the items on an 
instructors’ observational record of students’ clinical 
performance, developed earlier on the basis of criti 
cal incidents. This was done so that the evaluations 
by instructors could be compared with student per 
ceptions of the same items of behavior 

An attempt was made to control for the fact that 
there are some students who are less widely known 
among their classmates than others. It was felt that 
this acquaintance factor would mean that an effec 
tive popular student would receive a higher nomina 
tion score than a perhaps equally effective classmate 
who is less widely known. The peer nomination 
questionnaire had an attached alphabetical roster of 
members. Accompanying instructions asked 
students to draw a line through the name of any 


class 
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student with whom “acquaintance has been too 
slight for good judgments.” 

The 11-item peer nomination scale was adminis- 
tered to the University of Pittsburgh School of Medi- 
cine senior class in cooperation with the Department 
of Medicine. Nominations were obtained from all 
members of the class (N = 87). The data were ob- 
tained during the month before June, 1956 gradua- 
tion, 


Results 
Weighting vs. Non-Weighting 


One of the first questions that arises in 
working with peer nomination data based on 
the nomination of the three best individuals 
is the problem of weighting the first, second, 
and third choices. It is easy to assume that 
a first place nomination is worth more than 
a third place one. This assumption may be 
true to some extent, but what generally occurs 
in practice is that a large majority of a class 
agrees that one individual ranks first, but 
when you multiply these first place nomina- 
tions by some assigned weight, this individual 
receives a phenomenal score which stands him 
head and shoulders apart from the class. It 
appears likely that the considerable agree- 
ment which is exhibited serves only to show 
that the individual deserves first place, and 
not how far in front of the group he happens 
to be as the weighted score would indicate. 

In order to test this assumption, intercor- 
relations were computed among the 11 peer 


Tal 


Product. Moment Intercorrelations of Peer Nominatior 
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nomination variables using the following two 
methods: (a) Weighted Scores (three for first 
place; two for second place; one for third 
place); (b) Nonweighted Scores (raw total of 
all nominations received regardless of place). 

Matrices for these two conditions are shown 
in Table 1. The product-moment coefficient 
was used in all computations. This table 
shows that correlations obtained by weight- 
ing nominations received, differs very little 
from correlations obtained without regard to 
weights. Differences in correlation ranged 
from .00 to .07 with a mean difference of .025. 


Cluster Analysis of the Peer Nomination 
Variables 


A cluster analysis of the table of intercor- 
relations following a technique described by 
Fruchter (1) using the weighted product- 
moment correlations revealed the following six 
medical student peer nomination factors: In- 
tellectual Factor (A, E, and K); Personality 
Factor (B); Confidence Factor (C, F, G, 
and I); Remuneration Factor (D); Medical 
Leadership Factor (H); and Friend and As- 
sociate Factor (J). 

It appears that further use of the peer 
nomination technique with medical students 
could be confined to the preceding six factors 
without a substantial loss in coverage. 


le 1 


1 Variables for Class Graduating 1956, Using Weighted 


Scores (Above the Diagonal) and Upweighted Scores (Below the Diagonal) 


(N = 

A B 
\. “Medical Facts” 19 
Kk. “Bedside Manner” 17 
C. “Accurate Observations” 66 36 
I). “Largest Income” 06 34 
Ek. “Diagnostic Skill” OR 19 
F. “Family Physician” 77 44 
G. “Calm in Emergency” 47 41 
H. “Community Medical Leader” 06 29 
I. “Turn Over Practice To” 53 50 
J. ‘Friend and Social Assoc.’ 18 37 
K. “Scientific Research Contrib.” 84 Ol 


Note. Decimal points omitted 


All correlations positive e 


87) 

Cc D k F G H I J K 
66 07 OR 78 50 O05 53 21 &3 
31 38 17 46 46 25 50 30 01 
02 65 74 54 00 56 14 44 
00 07 06 O4 38 03 06 03 
“67 = 07 74 48 04 51 21 86 
73 01 78 77 10 &5 40 41 
52 -—02 48 74 11 72 29 20 
04 36 05 13 16 18 14 02 
55 00 55 85 72 20 0) 20 
11 07 17 40 32 20 61 12 

48 03 4 46 25 01 25 07 


’ 


xcept as noted 
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Influence of Slight Acquaintance 


The total number of times a name was 
crossed out by the entire class ranged from 1 
to 32. The mean number of times a name 
was crossed out was 11.9. The standard 
deviation was 6.68. It was thought that this 
factor should be reflected in the individual 
scores. The following formula was devised: 


Individual trait score 
Total No. Nominations Received 


~ Total No. Times Name Not Crossed Out ~ 100. 


This formula attempted to make use of in- 
formation concerning a student’s popularity. 
For example, with two students having an 
equal number of peer nominations, the stu- 
dent who is less widely known (name crossed 
out more often) will receive a higher trait 
score. Presumably, he would have received a 
higher score if more of his fellow students 
knew him. This method of computing the 
peer nomination trait score was compared 
with the nonweighted raw score method by 
calculating the product-moment correlation 
coefficient between the two types of meas 
ures. These correlations ranged from .99 to 
1.00. This scoring formula using the “slight 
acquaintance” information, in effect, does 
nothing more than apply a constant to all 
average or total scores. It does, however, 
serve the useful purpose for the students of 
helping them to feel that a student will not 
be penalized merely because he is not as well 
known as some of the other students. 


Reliability 


If the peer nomination questions are clearly 
stated and individuals are working from simi- 
lar frames of reference, and, if the technique 
is reliable, nominations by one-half of the 
group should relate significantly to nomina- 
tions by the other half. This is in contrast 
to split-half test reliability where scores of a 
group on one-half of the test are correlated 
with scores of the same group on the second 
half of the test. For paper and pencil test 
reliability, where items are homogeneous, a 
correlation between comparable test halves is 
an appropriate and accepted measure of reli- 
ability. For determining the reliability of 
these nominations, therefore, a correlation 
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was obtained between comparable group 
halves, since presumably the group was ho- 
mogeneous. Each peer variable was consid- 
ered as if it were an individual test, and 
separate reliability coefficients computed. In 
the present peer test, for each of the vari- 
ables which is dependent upon a frame of ref- 
erence which exists outside the personal sphere 
of the individual, i.e., where personal feelings 
of likes and dislikes are minimized and re- 
ciprocal choices are not involved, this meas- 
ure of reliability appeared appropriate. For 
the peer item which asked for ‘your best 
friend,” it is unrealistic to expect a high split- 
half correlation. This variable appears de- 
pendent upon a personal preference factor 
that is irrelevant to reliability in the terms 
that we have defined. 

The split-half was obtained on the basis of 
odd-even numbered individuals listed in the 
alphabetical class roster. Spearman-Brown 
corrected coefficients ranged from .72 to .98 
with a mean of .89. 


Peer Nominations by Students vs. Perform- 


ance Record Evaluations by Instructors 


How do student evaluations by instructors 
compare with peer evaluations on similar per 
formance dimensions by students? Student 
peer nominations scores of total number of. 
nominations received was correlated with per- 
formance record scores of average instructor 
ratings. Table 2 shows the results of this 
comparison. Only three of the eight correla- 
tions on specific variables were significant. 
These variables, concerned with knowledge of 
medical facts, patient-physician relationships 
or “bedside manner,” and diagnosis and fol- 
low-up were significant at the 5% level. 
There was practically no relationship (+ .03) 
between instructor judgment of student in- 
tegrity or recognition of own limitations, and 
peer evaluations of this factor as measured 
by the “turn over practice to” variable. This 
presumes that a practice would be turned 
over to that individual regarded 
ethical. 

It is interesting to note that, prior to the 
collection of data, it was anticipated that the 
peer variable concerning who will have the 
largest income would be a negative indicator 


as most 
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Table 2 fe 


a 
Relationship Between Peer Nomination Variables and 
Similar Variables on the Clinical Perform 
ance Record (Peer Evaluations vs. 
Instructor Evaluations) 


(N = 41) 
Perform Product 
ance = Moment 
Kecord Peer Corre 
Variable Variable Key Statement lation 
1 4 “Knowledge” +.51* 
2 i *Patient-Physician + .40* 
Relationship 
4 ( Observing and +18 
Recording” 
5 I ‘Diagnosis and +40" 
Follow-up” 
5 1 “Total Patient +15 
Problem" 
6 G “Emotional +.21 
Stability 
% i ‘Sell- Improvement +.18 
7 1 “Integrity” + .03 
Total 8) Largest Income +.37* 
Instrument 
Total lotal +.44** 
Instrument Instrument® 
Potal Total + 48** 


Instrument Instrument 


* Significant at the 5% 
** Significant at the 1% 
* bxcludes 


level of confidence 
level of confidence 
“who will have the largest income” 


variable. 

of physician effectiveness. The rationale for 
this was the idea that a student nominated as 
one who will have a large income is perhaps 
not motivated by the most desirable and 
high-minded principles to which physicians 
normally aspire. The relationship between 
this variable and the other peer variables 
bears out this hypothesis since eight of the 
ten intercorrelations vary between — .06 and 
+ .07. The other two correlations, both + .38, 
also tend to confirm the hypothesis. These 
are the “bedside manner” and “community 
medical leader” variables. It is perhaps pos- 
sible that, to the students, the qualities likely 
to lead to a large income are also the “driv- 
ing, politically aggressive” qualities of the 
community leader, and the “smooth, suave’’ 
qualities which contribute to a good bedside 


manner. 

Additional information on this question is 
contributed by the results of the compari- 
son between peer evaluations and instructor 
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evaluations. Following the line of reasoning 
suggested in the preceding paragraph, the 
qualities associated with a high score on the 
large-income variable are, perhaps, also the 
kinds of qualities which can favorably im- 
press instructors. This is indicated by the 
correlation of + .37 (5% level of signifi- 
cance) between the peer “largest income” 
variable and the total instrument instructor 
evaluations. This suggests that a presumably 
undesirable quality is entering into the over- 
all instructor evaluations of students. It also 
suggests that peer nominations are less influ- 
enced by this factor. 


Relation between Peer Nomination Criteria 
and Course Grades 


The usual and most generally accepted 
measures of success in medical schools are 
course grades. Table 3 compares the final 
numerical grade in the major senior courses 
as well as the final numerical senior year av- 
erage, and over-all four-year medical school 
average with the peer nominations total score 
(excluding the “friend and associate” vari- 
able). Several significant relationships were 
obtained. Five of the seven coefficients were 
significant at the 1% level with one additional 


Table 3 


Product-Moment Relationship Between Class Grades, 
Peer Nomination Unweighted Total Scores, 
and the Clinical Performance Record 





Peer Clinical 
Nomination Performance 

Total Record 

Senior Year Final Scores* Scores” 

Numerical Grade (N =87) (N=41) 

Medicine + .60** +.58** 
Pediatrics +-.24* + 40° 
Psychiatry + .33°* + .39* 
Surgery + .30** +.16 
Obstetrics + .12 +07 

Senior Year Average + .40°* + .47** 

Four Year Over-all Average’ + .62°* +.43** 


* Represents the total unweighted nominations re 
ceived, excluding the J" variable (‘Friend and Associate” 

» Correlations for Medicine and Pediatrics are spuriousls 
high because final department grade was influenced by the 
Clinical Performance Record result Mr vas true to a lesser 
extent in Pediatrics where final grade was also dependent upon 
other data 

¢ Unweighted combination of the four 

* Significant, 5°) level 

** Significant, 1° level 


peer 


yearly averages 
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coefficient significant at the 5% level. ‘Fhe 
correlations are large enough to show that 
peer nominations and grades are Closely re- 
lated (as well they should be) but not so 
large as to reveal excessive overlap or com- 
munality. 

Some approximation of the reliability of 
medical school grades may be noted from the 
correlations reported in Table 3 between class 
grades and the clinical performance record. 
Here again, there are significant relationships 
but none that show an excessive overlap. 


Discussion 


It appears that instructor evaluations may 
not tell the whole story about student effec- 
tiveness. The instructor view is necessary 
but not sufficient for comprehensive student 
evaluations. The peer view has been shown 
to make a unique contribution to more com- 
prehensive evaluation. The question arises, 
however, about the place of peer nominations 
in the medical school, and indeed, in any pro- 
fessional school. Everyone recognizes that 
evaluation is the proper role for the instruc- 
tor, but is evaluation a proper role for the 
student? 

The first criterion to be considered in set- 
ting up a sociometric test, or any peer test, is 
the necessity for choosing a situation the con 
sequences of which make a difference to the 
individuals involved. The same applies here. 
Students doubtless would not respond with 
meaningful information to a situation which 
is not only without benefit to them but also 
may reflect unfavorably on them. This would 
be the situation if nominations were used for 
routine administrative purposes. 

We should not take advantage of the stu- 
dent’s unique position for observation to ob- 
tain information which administrators and 
teachers should obtain as their normaF func- 
tion. Students would soon be motivated to 
give an untruthful picture in order to rid 
themselves of the burden of having to evalu- 
ate themselves. This would be true even 
though they are not asked to make negative 
nominations. Though it requires little im- 
agination on the part of fairly sophisticated 
students to know that the absence of a posi 
tive nomination is comparable to a negative 
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nomination, there is very little objection to 
the procedure. A negative vote by omission 
appears less objectionable than a_ negative 
vote by commission. 

It seems that the best use of peer infor- 
mation is for research purposes especially in 
developing more comprehensive or diagnostic 
performance criteria. Valid information will 
be volunteered by students if assurances can 
be given (and meant) that the information 
will remain confidential and will, in no way, 
affect their class standing. 

It is expected that peer nominations will 
serve as useful criteria of performance in 
education research as they have in military, 
industrial, and other research applications. It 
is felt that their besg area of usefulness in 
higher education will be in situations where 
personal qualities, personality, personal inter- 
relationships, and other intangibles are in 
volved, rather than scientific, or technical 
competence. For example, it does not ap- 
pear appropriate or necessary to ask peers in 
a history class “who has the best grasp of 
historical information?” ‘There are easily ad- 
ministered, and perhaps more valid, tests and 
measures available for assessing this. An ap- 
propriate area is one in which personal quali- 
ties as much as technical know-how are criti 
cal for success. Such uses might be in pro- 
fessional schools, administration courses, and 
preprofessional courses. 


Summary and Conclusions 


Attempts to improve the selection of medi 
cal students and revise medical education pro 
grams have been hampered by the absence of 
realistic measures of student quality during 
their years in medical school. This is espe- 
cially true in the third and fourth years of 
medical school when traditionally the stu- 
dent program 
more like actual practice. 

The present study was an attempt to de 
velop, try out, and analyze a sociometric peer 
nomination criterion for evaluating medical 
student effectiveness in medical school. Stu- 
dents have more time to observe each other 
than do their instructors, and have a unique 
set of values and frame of reference for 
evaluating each other's medical effectiveness 


becomes less academic and 
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Previous related research showed that peers 
can make reliable evaluations which are use- 
ful as criterion data. A peer nomination scale 
was developed that requested nominations of 
the three highest rated individuals on eleven 
variables. These variables closely paralleled 
those on a previously developed clinical per- 
formance record. ‘Thus, instructor and stu- 
dent estimates were obtained on comparable 
measures. The peer scale was administered to 
the 1955-56 senior class (N = 87) at the 
University of Pittsburgh School of Medicine 
several weeks before graduation. 

1. Over-all peer nomination reliability, us- 
ing the correlation between comparable class 
halves, was + .89. 

2. Weighted first, second, and third peer 
choices did not differ significantly from un- 
weighted choices. 

3. Peer effective physician results are rela- 
tively independent of the student buddy- 
friendship factor. 

4. Crossing out names of unfamiliar stu- 
dents (influence of slight acquaintance) had 
no effect when analyzed in the peer nomina- 
tion results but served a useful purpose in 
student perception of results. 


5. Correlation between over-all student peer 


nominations and instructor evaluations on 
comparable measures was + .44, significant 
at the 1% level. It appears, nevertheless, 
that each measure is making a unique con- 
tribution to the total student variance. 

6. A cluster analysis of peer nomination 
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intercorrelations showed that the 11 original 
variables may be conveniently reduced to six 
factors. 

7. Correlations between peer nominations 
and grades in several of the major senior year 
medical school departments were all positive 
and, in most instances, significantly different 
from zero. 

8. It is suggested that because evaluation 
is not a proper role for the student in routine 
assessment, peer nominations be gathered and 
used only for research purposes with results 
having no effect on student standing. 


Received February 15, 1957. 
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Quickening and Damping a Feedback Display *'’ 
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Not long ago the pilot of an all-weather in- 
terceptor was confronted with the emergency 
of an inoperative attitude indicator. In such 
circumstances a pilot ordinarily makes use of 
the turn indicator to maintain straight-and- 
level flight. Since any change of heading is 
accompanied by an indication of a nonzero 
rate of turn, the pilot’s task is to compensate 
with his aileron control for fluctuations of the 
turn indicator. In this case, however, ‘the 
pilot experienced great difficulty in maintain- 
ing straight-and-level flight because the needle 
in the instrument initially deflected in the 
wrong direction when the aircraft was rolling 
into turns” (2). 


Operation of the Turn Indicator 


This instrument is based upon the principle 
of gyroscopic precession. It contains a gyro 
assembly which precesses, or tilts, about its 
longitudinal axis (A) when a torque is ap- 
plied about its vertical axis (B). The force 
of precession is the same as the deflective 
force, and since the gyro assembly is spring- 
centered, its tilt angle is proportional to the 
deflective force. The deflection of the needle 
in the instrument display which results from 
this tilt therefore measures the torque applied 
about the vertical axis of the gyro assembly. 

If the axes of the gyro assembly are paral- 
lel with the axes of the aircraft, a rotation of 
the aircraft about its vertical axis applies a 
torque about the vertical axis of the gyro as- 
sembly proportional to the rate of turn. A 
change in the aircraft’s heading, therefore, 
will be accompanied by a precession of the 
gyroscope, and consequently by a deflection 

1 This research was supported by the USAF under 
Contract No. 33(616)-3000, monitored by the Aero- 
Medical Laboratory of Wright Air Development 
Center. Permission is granted for reproduction, pub- 
lication, use and disposal, in whole or in part, by 
and for the United States Government. 

2 Essentially this paper was presented at the USAF- 
N.R.C. Symposium on Personnel, Training, and Hu- 
man Engineering Research, Washington, D. C., No- 
vember, 1956. The authors at that time were on the 
staff of the University of Illinois. 


of the indicator needle proportional to the 
rate of turn. If, however, the gyro assembly 
is rotated about its lateral axis (C) so that 
it is mounted with its longitudinal axis at an 
angle to that of the aircraft, precession will 
also accompany a change in the aircraft's 
bank attitude in proportion to the rate of roll 
and to the angle through which the gyro as- 
sembly has been rotated in its mounting. In 
this case, a deflection of the needle in the in- 
strument display will occur when the aircraft 
rolls as well as when it turns. The propor- 
tion of the needle’s deflection which results 
from bank is greatest when the rate of roll is 


Fic. 1. Rotation about its principal axes of the gyro 


assembly of an aircraft’s turn indicator 


highest. Thus as the aircraft rolls into a turn 
this proportion is relatively high. Then as 
the turn is established and the rate of roll 
diminishes toward zero, more and more of the 
needle’s deflection will be due to the change 
in heading. 

The deflection of the needle due to roll 
may or may not be in the same direction as 
the deflection due to turn. If the forward 
end of the gyro assembly is tilted down, the 
roll component of the needle’s deflection will 
be subtracted from the turn component, and 
the needle may actually be deflected in the 
direction opposite to that of the turn. This 
is what happened to the pilot in question. In 
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his aircraft ‘the instrument panel is tilted 19° 
forward, The turn indicator is mounted so 
as to compensate partially for the slope of 
the panel, but the gyro assembly is neverthe- 
less tilted 15° down at its forward end. By 
virtue of its being tilted out of its normal 
alignment, this instrument was sensing not 
only the rate of turn of the aircraft, but also 
its rate of roll. The tilt was in such a di- 
rection that the two signals—rate-of-roll and 
rate-of-turn— were of opposite signs when the 
aircraft was establishing a normal turn. If 
the after end of the gyro assembly had been 
tilted down, the roll component would have 
been added to the turn component instead of 
subtracted from it. Then the result as the 
aircraft rolled into a turn would be a more 
rapid deflection of the needle in the direction 
of the turn than would occur if the gyro were 
not tilted, 

One obvious solution to the problem cre- 
ated by the tilted instrument panel was to 
tilt the gyro assembly within the instrument 
case to compensate for the slope of the panel. 
However, the suggestion was made that the 
instrument might be further improved by 
overcompensating—tilting the mechanism in 
such a direction that the rate-of-roll signal 
would be of the same sign as the rate of turn. 
This would have the effect of “quickening” 
the display. 


Ouickening 


In a recent paper by Birmingham and 
Taylor (1), a proposal is made that the pre- 
cision of a man-operated continuous control 
system can be enhanced by “quickening” the 
feedback. In a quickened display, a_par- 
ticular indication occurs in response to a 
given control action more rapidly than it 
would in the corresponding unquickened dis- 
play. That is, the speed with which an op- 
erator is provided with knowledge of the re 
sults of a control action is increased by 
quickening. The stability of the control sys- 
tem, and consequently — its are 
thereby augmented. 

From the foregoing, it is clear that the dis- 
play of a turn indicator is quickened when 
the after end of its gyro assembly is tilted 
down, since then the deflection of the indi- 
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cator needle in response to a control action 
occurs more rapidly. Tilting the forward end 
of the gyro assembly down results in what 
may be called “negative quickening,” since it 
changes the display in just the opposite way 
to quickening. The pilot of our all-weather 
interceptor suffered from negative quickening 
of his turn indicator display. 


Problem 


There could be little doubt of the undesir- 
ability of negative quickening, and we have 
seen how a solution to the problem raised by 
our pilot’s difficulty immediately suggested it- 
self. If the gyro assembly were rotated about 
its lateral axis until mounted with its longi- 
tudinal axis parallel to that of the aircraft, 
the negative quickening would disappear. 
But this very solution raised another ques- 
tion: why restrict the modification to the 
elimination of negative quickening? * Why 
not rotate further and introduce positive 
quickening? Once raised, this question re- 
solved itself to: “What degree of quickening, 
if any, would result in the best performance?” 
If quickening were to be introduced, however, 
another problem would arise. Since the indi- 
cator needle would respond more rapidly to 
rough air as well as to control action when 
quickened, would there not be an increased 
tendency for it to oscillate? Such a tendency 
could be reduced by damping the motion of 
the indicator needle. Our problem then could 
be phrased: “What combination of quicken- 
ing and damping in the turn indicator dis- 
play would optimize the performance medi- 
ated by that display?” 


Experimental Apparatus 


The apparatus employed in these studies 
consisted of a YF 102 flight simulator 
equipped with devices for measuring the per- 
formance of the man-machine system. 


The Simulator 


This is an analog computer which continu- 
ously solves the flight equations of the YF 
102 aircraft. It is equipped with an actual 

8 The authors are indebted for this suggestion to 


George Purcell of the Flight Control Laboratory, 
Wright Air Development Center. 
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YF 102 cockpit in which the instrument dis- 
play indications change continuously in re- 
sponse to the operator’s control actions just 
as they would in actual flight. The display 
indications also vary according to a forcing 
function which simulates atmospheric turbu- 
lence. The degree of turbulence can be varied 
arbitrarily. This function provides an inter- 
mittent perturbation of the display indica- 
tions. 

The turn indicator. A system was incor- 
porated into the simulator by means of which 
the experimenter could vary the damping of 
the turn indicator needle and the simulated 
tilt of the gyro assembly’s longitudinal axis 
i.e., rotation of the gyro assembly about its 
lateral axis (C in Fig. 1). These factors, of 
course, were to be the independent variables 
under investigation. The simulated tilt was 
measurable in degrees of angular separation 
between the longitudinal axes of the aircraft 
and of the gyro assembly. A positive num- 
ber of degrees of tilt quickened the instru- 
ment display, and a negative tilt resulted in 
negative quickening. The amount of damp- 
ing was measurable as a percentage of “nor- 
mal” damping, i.e., the damping of the un- 
modified instrument. 

Rough air. The system which generated 
the forcing function to simulate atmospheric 
turbulence was modified so as to provide a 
continuous rather than an intermittent per- 
turbation. This was necessary in order that 
all performance could be measured under con- 
ditions of similar difficulty. Although the 
pattern of perturbation was not identical for 
each measurement, a_ sufficient number of 
measurements were made under each experi 
mental condition so that we believe that diffi- 
culty level did not vary systematically. Its 
variation did, however, reduce the precision 
of our measurement to an unknown degree. 


The Measuring Apparatus 


Since the flight simulator is an electronic 
analog computer, it is possible to measure 
electrical potentials at different points in the 
system which are at any instant proportional 
to the various parameters of the flight equa- 
tions. Measuring apparatus was_ installed 
which could detect potentials proportional to 
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simulated altitude, heading, pitch attitude, 
and bank attitude. The pitch and altitude 
channels were included merely to insure that 
the subjects did not neglect their altitude con- 
trol. Primary interest centered in the direc- 
tional channels. 

The difference between the simulated value 
of a parameter and the corresponding desired 
value is the error. The output signals of the 
measuring apparatus were potentials propor- 
tional to the time integrals of the squared 
errors. These signals were accumulated dur- 
ing a thirty-second measurement period, At 
the end of that period they were measured 
with a voltmeter. 


Preliminary Investigation - 


When the apparatus had been developed, 
three preliminary experiments were carried 
out. The purpose of these was the determi- 
nation of the most favorable quickening and 
damping values. In all three experiments the 
Ss were asked to maintain a constant heading 
and altitude by means of the elevon (aileron 
and elevator) control of the simulator. It 
will be noted that in this task the rate of 
turn of the aircraft is determined by its hank 
angle (the effect of atmospheric turbulence 
being neglected), which is therefore a more 
sensitive index of directional control than is 
heading. ‘The measurements obtained were of 
potentials proportional to the time integrals of 
the squared discrepancies between actual per- 
formance and the performance which would 
have occurred in straight and level flight. 

The Effect of Quickening 

The first preliminary experiment was de- 
signed to show the effect of tilting the gyro 
in such a manner as to produce positive quick- 
ening. Four different angles of tilt of the in- 
strument’s gyro assembly were tested. 
were 0°, 24°, 5°, and 10°. Normal damping 
was used throughout. The intensity of rough 
air simulation was set at 25% of the system’s 
capacity. Eight jet-qualified Air Force pilots 
served as Ss. Approximately twenty minutes 
of familiarization time was used to acquaint 
each S with the characteristics of the simu 
lator. Each S was tested four times in each 
condition, and 
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Fic. 2. The effect of quickening upon system output. 


such as to equalize the mean amounts of prac- 
tice among quickening levels. The incidence 
of superior performance in both the bank and 
heading channels, as measured by the appa- 
ratus described above, was found to increase 
with quickening. (By “superior performance” 
is meant any performance resulting in a meas- 
urement smaller than or equal to the median 
of all measurements made in the same chan- 
nel.) Since the increase from 5° of quicken- 
ing to 10° was slight, the less radical de- 
parture from conventional instrumentation 
was chosen for further study. 


The Effect of Damping 


Having determined that tilting the gyro to 
produce positive quickening would result in 
better performance, a second preliminary ex- 
periment was undertaken to assess the effects 
of damping. The quickening produced by 5° 
of tilt was used in all conditions. Three 
damping levels were tested. They were: 
100% , 300%, and 500% of normal damping. 
Two levels of rough air were used—25‘% and 
100%. Six jet-qualified Air Force pilots 
served as Ss. Each S was tested eight times 
in every condition, four times at each rough 
air level, and mean amounts of practice were 


equalized among damping levels. Within the 
range of the experimental conditions, the in- 
cidence of superior performance increased 
regularly with damping level, both in bank 
and in heading. 


The Limits of Improvement 


Performance on the quickened turn indi- 
cator was best at the highest level of damping 
employed in the second preliminary experi- 
ment. The third preliminary experiment was 
undertaken to determine if still further im- 
provement could be obtained by extending 
the range of quickening and damping even 
further. Four conditions of quickening and 
damping were employed: (a) No quickening 
and normal damping; (4) 5° positive quick- 
ening and 500% of normal damping; (c) 5° 
positive quickening and 700% of normal 
damping; (d) 10° positive quickening and 
700% of normal damping. Rough air was 
set at 25% for the first 16 measurements and 
at 100% for the second series of 16 measure- 
ments, and the order of conditions was coun- 
terbalanced within each series. Eight Air 
Force pilots were tested. Five were jet-quali- 
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fied and three were not. When inspection re- 
vealed no reliable differences between the 
groups, the data were combined. Each S was 
tested eight times in every condition, four at 
25% rough air and four at 100%. ‘The re- 
sults showed that no further benefit was to be 
gained by increasing damping and quickening 
beyond 500% and 5°, respectively. The bene- 
ficial effect of quickening is further established 
by these data, as is that of damping. The 
superiority of the 5°-500°% indicator over 
the standard indicator is, in fact, substantially 
greater than would have been expected on the 
basis of the previous assessment of either the 
quickening or the damping effect, or of both 
together. 


Test of the Optimum Combination 


The optimum combination of quickening 
and damping to emerge from the preliminary 
experimentation was 5° of tilt and 500% of 
normal damping. This combination was then 
tested more elaborately in comparison with 
the standard turn indicator. Nine jet-quali- 
fied Air Force pilots served as experimental 
Ss. The measurements were made in the YF 
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102 simulator, by means of the apparatus de- 
scribed above, in three different conditions. 
In each of these conditions the Ss were given 
the task of maintaining a simulated altitude 
of 20,000 feet and a heading of North by the 
use of the elevon control alone. The in- 
tensity of rough air simulation was 50% of 
the system’s capacity. 

The three conditions were (a) Reference: 
full panel with standard turn indicator; (0b) 
Control: full panel with standard turn indi- 
cator and with attitude indicator covered; and 
(c) Experimental: full panel with experi- 
mental turn indicator and with attitude indi- 
cator covered. These conditions were pre- 
sented in the following series: a, b, c, c, b, a, 
b,c, c, b, a, .... This series was continued 
until no fewer than three complete cycles * 
had been presented and until no measure- 
ment obtained in the final cycle was beyond 
(i.e., larger than) the capacity of the measur- 
ing apparatus. In this fashion Ss were trained 
to a common criterion in the performance of 
the experimental tasks, and the criterion was 
such that all performance in the final cycle of 
the series was subject to measurement. Al- 
though the measuring apparatus would yield 
a signal when performance did not meet the 
criterion, the only meaning which could be 
attached to such a signal was that the per- 
formance measured was not within the ca- 
pacity of the measuring apparatus. 


Results and Analysis 


The square roots of the measurements ob- 
tained from each S during the final cycle were 
computed. Since each of the measurements 
consisted of an electrical potential propor- 
tional to the time integral of the squared 
error committed by the S during the 30-second 
measurement period, these error indices were 
proportional to the error root mean squares. 
Of the six measurements obtained from each 
S in the final cycle, two were obtained under 
each of the three conditions. 

Inspection did not disclose any prominent 
effects of the independent variables on the 
error indices computed from measurements in 
the pitch and altitude channels. Accordingly, 


*A cycle consists of any six measurements begin 
ning and ending with condition a—the full panel 








Malcolm L. Ritchie and Harold E. Bamford, Jr. 


Table 1 


Analysis of Variance of Heading Error Inde» 


Source of Variation WS 
Subjects 9.06 
4.70 
4.99 
Indicators X trials 67 
Error 2.32 


Indicators 
Trials 


only the data from the two channels reflecting 
directional control were studied further. The 
error indices computed from measurements 
obtained under the experimental and control 
conditions in these channels were subjected 
to analysis of variance. The analyses are 
summarized in Tables 1 and 2. As is made 
apparent in those tables, much of the vari- 
ability of the error indices is the effect of in- 
dividual differences. However, due to the de- 
sign of the experiment, which provided that 
each S be tested twice under each condition, 
the effect of individual differences does not 
contaminate our estimation of the effects of 
conditions and trials. 

As can be seen in Fig. 5, the mean error 
index in each channel increased when Ss were 
deprived of the attitude indicator. The in- 
crease was less under the experimental condi- 
tion than under the control condition. The 
difference between the two conditions failed 
of statistical significance in its effect on the 
heading error index, but the superiority of 
the experimental turn indicator over the 
standard turn indicator with respect to the 
bank error index was significant at the .05 
level of confidence. The lesser reliability of 
the effect on heading error is explained by the 


Table 2 


Analysis of Variance of Bank Error Index 


Source of Variation WS 


Subjects 
Indicators 

Trials 

Indicators * trials 
Error 


BANK CHANNEL 
[—) HEADING CHANNEL 


106 





100 





CONTROL EXPERIMENTAL 


Fic. 5. Mean error index obtained under experi- 
mental and control conditions as a percentage of 
mean error index obtained under reference condition. 


greater sensitivity of bank angle as an index 
of directional control. 


Discussion and Conclusions 


The results of the experiments which have 
been reported demonstrate the favorable effect 
upon human regulatory performance of cer- 
tain modifications in a feedback display. An 
understanding of those modifications will be 
aided by a study of Fig. 6. In this figure, 
there is schematized a simplified model” of 
the man-machine system whose output was 
measured in our experiments. 

The two major components of the model are 
the pilot and the aircraft. The diagram por- 
trays the dynamic relations between the in- 
put (rough air) at the left and the output 
(heading, or y) at the right. A box contain- 
ing an integral sign represents an integrator— 
an element whose output is the time integral 
of its input. Since rough air will impart an 
angular acceleration to the aircraft, the out- 
put of the first integrator in the model is an 
angular velocity. This velocity has two com- 
ponents, one about the longitudinal axis of 
the aircraft (a) and one about the vertical 
axis (pb). 

A circle containing a cross represents a 
mechanical or electrical differential which 
adds algebraically. That is to say, the out- 
put of one of these elements is the algebraic 

5In this discussion, the authors are indebted to 
Henry P. Birmingham and Franklin V. Taylor for 
their helpful suggestions and criticism. Responsi- 
bility for all statements made in this paper, however, 
is solely the authors’.’ 
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Fic. 6. A simplified mode! of the experimental man- 


machine system 


sum of its inputs. Thus the aircraft’s rate 
of roll, ¢, is the algebraic sum of a and c. 
And its rate of turn, ¥, is the algebraic sum 
of 6 and d, where d is the time integral of the 
aircraft’s roll rate, i.e., the bank attitude of 
the aircraft. (As we have noted above, for a 
given airspeed the bank attitude of the air- 
craft determines its rate of turn, if the per- 
turbation of atmospheric turbulence be dis- 
counted. ) 

The time integral of turn rate is heading. 
Thus, from input to output the system con- 
tains signals in four time orders—heading, 
and its first three derivatives. 

Heading is indicated to the pilot in a feed- 
back display and, along with the indication in 
a derivative feedback display, determines his 
control action. . Each feedback channel con- 
tains an amplifier of adjustable gain, repre- 
sented within the pilot’s box in Fig. 6 by a 
triangle. Wherever amplifiers are present in 
channels leading to a differential, the gains of 
those amplifiers determine the relative weights 
of the tributary signals in the determination 
of the differential’s output. The gains of the 
two amplifiers in the pilot’s box are the 
weights assigned by the pilot to the two feed- 
back display indications in the determination 
of his control action. 

In this simplified model, a force by the 
pilot on the aileron control imparts an angular 
acceleration to the aircraft about its longi- 
tudinal axis. The time integral of this ac- 
celeration is an angular velocity, c, which as 
we have seen is a component of rate of roll. 
But c is fed back into the control action chan- 
nel. This negative feedback represents the 
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damping action of the aircraft’s wings, since 
the air exerts a retarding force against the 
wings proportional to the rate of roll. The 
gain of the amplifier in the feedback channel 
is determined largely by the aircraft’s wing 
area. Because of this feedback, to say that 
a control action by the pilot establishes a roll 
rate would involve only a_ negligible over- 
simplification. That is, only transitory an- 
gular accelerations can be effected by control 
action. 


Ouickening 


The S’s task in this experiment is to main- 
tain a null error signal. Although this task 
might be accomplished through the media- 
tion of the y error feedback loop alone, it 
would be difficult. An additional, tighter, 
feedback loop is therefore provided. In this 
loop, a weighted composite of ¥ and ¢ is com- 
puted and fed back to the pilot. The gains 
of the amplifiers in the Y and ¢ feedback 
channels determine the relative contributions 
of the two signals. Any tilt angle of the gyro 
assembly in its mounting is represented by 
the gain ratio of the two amplifiers. The 
gain of the ¢@ amplifier is zero when the axes 
of the gyro assembly parallel those of the air- 
craft. As the after end of the gyro assembly 
is tilted down, the gain of the ¢ amplifier in- 
creases relative to that of the y amplifier. 
Tilting the gyro assembly in the opposite di- 
rection will have the reverse effect upon the 
gain ratio. 

A gain ratio corresponding to positive 
quickening is found to increase the effective- 
ness of the feedback display in directing con- 
trol action which will reduce system error. 
The quickening resulting from a rotation of 
the gyro assembly through a 5 
pears to be optimum. 


angle ap- 


Damping 


In the diagram of the ¢-y feedback loop 
there is depicted a secondary feedback loop. 
It is thus that we represent the damping of 
the turn indicator needle. The motion of the 
needle is damped by a retarding force pro- 
portional to the needle’s velocity. The box 
in this loop containing the symbol d/dt repre 
sents the differentiation with respect to time 
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of the ¢-y signal. The resulting derivative 
passes through an amplifier whose gain repre- 
sents the experimental damping level. The 
effect is to reduce the relative contributions 
to changes in the feedback display indication 
of the various components of the feedback 
signal in direct proportion to their velocities. 

We have seen that the ¢ signal has two 
components—c resulting from control action 
and a resulting from rough air. We have 
further seen that changes in c are damped by 
the aircraft's wings. Accordingly, the damp- 
ing of the indicator needle’s movement has 
relatively little effect upon c, since that signal 
only changes during brief intervals. Changes 
in a, on the other hand, are more sharply at- 
tenuated, since their occurrence is more pro- 
longed. 

The result of this damping is a display in- 
dication which changes more smoothly, with 
less random fluctuation, or jitter. The effec- 
tiveness of the ¢-y feedback display in direct- 
ing control action which will minimize system 
error is found to increase with increased 
damping of the indicator needle, up to about 
500°7 of normal damping. 


Summary 


1. Difficulty in controlling an all-weather 
interceptor under emergency conditions was 
explained as the result of negative quickening 
of the indicator display inadvertently intro- 
duced by the use of a standard instrument in 
an aircraft with a sloping instrument panel. 
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2. In a preliminary experiment, the inci- 
dence of superior performance by experienced 
pilots in simulated flight was found to in- 
crease with positive quickening. 

3. In a second preliminary experimert, the 
incidence of superior performance was found 
to increase regularly with the level of damp- 
ing of the motion of the indicator needle. 

4. A third preliminary experiment con- 
firmed the findings of the first and second 
and revealed no further improvement in per- 
formance with increase in quickening beyond 
that resulting from 5° of positive tilt of the 
instrument’s mechanism, or increases in damp- 
ing level beyond 500% of normal. 

5. The final experiment demonstrated the 
superiority of the optimum combination over 
the standard turn indicator. The 5°-500% 
combination showed a smaller decrement of 
performance upon attitude indicator failure 
than did the control instrument. 

6. The experimental findings are discussed 
in relation to a simplified model of the man- 
machine system studied in the experiments. 
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Contrary opinion and evidence exists con- 
cerning the fakability of forced choice instru- 
ments such as the Gordon Personal Profile. 
Rusmore (6) reported: little practical differ- 
ence in the mean scores when the Profile was 
administered to 81 college psychology stu- 
dents twice, first under instructions that the 
students were to imagine themselves apply- 
ing for a job which they want and second un- 
der instructions that the students were to 
imagine they were seeking guidance. Rus- 
more thus examined whether students would 
fake results, not whether they could fake re- 
sults if motivated to do so. However, Long- 
staff and Jurgensen (4) and Mais (5) both 
found that students could significantly alter 
results on the Jurgensen Classification In- 
ventory, a similar forced choice instrument, 
if the students were instructed to make as 
good impression as possible in the simulated 
employment screening situation. 

Gordon and Stapleton (1) attempted to 
simulate a more realistic employment situa- 
tion. High school students were told that if 
they desired employment at the end of the 
school year, and wished assistance in obtain- 
ing such employment, they should complete 
the Gordon Personal Profile. The tests were 
administered in the classrooms, and applica- 
tions were “accepted through the municipal 
school system.” It remains uncertain whether 
the students interpreted the testing like a 
guidance situation or like a true employment 
situation where the students would apply for 
a specific job in a specific company. Yet, 
even with this somewhat ambiguous experi- 
mental variation, Gordon and Stapleton found 
significant increases on the Responsibility and 
Emotional Stability scales when students actu- 
ally took the examination in application for 
“assistance in obtaining employment” and “as 
part of a guidance program.” However, the 
correlations between the two administrations 
were almost as high as the reliability of the 


four scale scores. But in view of the am- 
biguity of the experimental manipulations, 
the authors properly were cautious about gen- 
eralizing their results to the industrial selec- 
tion situation. 

There seems to be no question that actual 
job applicants fake traditional personality in- 
struments. Thus, Heron (3) noted that ap- 
plicants for a job as bus conductor did fake 
emotional stability scores on a conventional 
type self-report personality schedule when 
“employment” and “research” situations were 
contrasted. Likewise, Green (2) found that 
applicants for police positions produced more 
favorable scores than those already selected. 

If applicants are motivated to fake as 
Heron and Green’s studies suggest, can they 
readily fake a forced-choice instrument? The 
present report concerned how well sales ap- 
plicants fake the Gordon Personal Profile in 
contrast to performance of already employed 
salesmen. Faking should be high for real 
applicants with a specific job in mind. If the 
forced choice inventory is fakable, we should 
find a high degree of faking. 


Method and Results 


The Gordon Personal Profile was administered to 
265 sales employees of a national food distributor as 
part of a program. It was also given to 
471 applicants for employment with this company 
Table 1 lists the mean differences in performance on 
the Profile as well as the mean intelligence, age, and 
educational levels of the two samples. Compared to 
applicants, the employed sample was older (approxi- 
mately 7.6 years), but about the same in intelligence 
(Wonderlic, Form D) and in amount of education. 

One-tailed t tests were appropriate to estimate the 
statistical significance of the differences of the Profile 
Scores since we hypothesized initially that applicants 
would be higher in Profile scores than employees 
The mean differences were actually as much as five 
times their estimated standard errors on the Re 
sponsibility and Emotional Stability scales. Appli 
cants scored significantly higher at the 1% level of 
confidence on all but the Sociability Scale. On So 
ciability, the difference was significant only at the 
5% level 


research 
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Table 1 


Differences Between Sales Applicants and Sales 
Employees in Performance on the Gordon 
Personal Profile and in Selected 
Personal Characteristics 


Mean 


Differ 
ence 


\pplicants 
Variable (N 


employees 
471) (N=265) 


Intelligence 25.8 24.9 9 

Age 28.0 35.6 7.6** 

13.3 12.9 4 

Ascendency 6.8 5.8 1.0** 

10.6 &.9 Pe oe 

Emotional Stability 7.8 60 1.8°° 
7 


Sociability 7 i 5° 


Education 


Responsibilit y 


*p Os 


o » O1, one tailed ¢ test 


Table 2 shows the correlation between intelligence 
Gordon Personal Profile scores for 
The significantly larger correlation of 
27 in the applicant sample between Ascendency and 
intelligence suggests that brighter applicants do a 
better job, or are motivated to do so, in faking the 
Ascendency Scale Otherwise, intelligence per se 
played no role. This result might be quite different 
among applicants for another type of job. For ex- 
ample, the brighter applicants for a bank cashier job 
might their reach higher Re- 
sponsibility rather than Ascendency scores 

It should be remembered that the obtained differ- 
ences are probably attenuated by the tendency of al- 
ready employed workers to make themselves look 
good on company-sponsored questionnaires. On the 
other hand, some of the differences may have been 
valid differentiation due to real personality differ- 
ences between applicants and employees 
stricted, selected group 
lation between test 


test score and 
each sample 


“steer” responses to 


a more re 
However, we found no re 
scores and the rated success of 
the employees, reducing, at least, the possibility that 
the employees are a select group with reference to 
the personality scales, for turnover among these em 
ployees is highly related to rated merit. Nor were 
the employees more homogeneous than applicants on 


Table 2 


Correlations Between Intelligence and Response to 
the Gordon Personal Profile 


Applicants Employees 
171) (N = 265) 


Ascendency 27° 05 
Responsibility 03 01 
Emotional Stability O1 05 
Sociability 03 02 


p< 01 


these personality scales. Yet, the employees were 
significantly older. Within the sample, age was 
negatively related to some extent with performance 
correlating 21 with Ascendency, 09 with Re- 
sponsibility, —.13 with Emotional Stability and 

.20 with Sociability. Some of the applicant-em- 
ployee differences in personality score could have 
been due to their average difference in age. 


Conclusions 


Inspection of the actual responses of appli- 
cants identified the source of faking. Appli- 
cants practically never earn a minus value 
on any response while employees often do. 
Applicants never indicate as most like them- 
selves some derogatory alternative in a tetrad. 
For example, if given the tetrad: 


highly sociable 
lacking in confidence 
very thorough 

easily upset 


applicants will almost always check as most 
like themselves “highly sociable’ or “very 
thorough,” and least like themselves, “lacking 
in confidence” or “easily upset.” Actually, 
they are being “forced” with pairs of choices. 
They will earn from 0 to 2 points on each of 
the four scales measured instead of from — 2 
to +2 points. Greater range in response 
among applicants can be obtained in several 
ways. For example four complimentary state- 
ments can be used in a tetrad of more subtle 
items. 
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A Measure of Supervisory Quality 


Harley W. Mowry 


University of Houston * 


That quality of supervision is directly re- 
lated to worker productivity seems to have 
been fairly well established by the Haw- 
thorne studies (29, 32) and subsequent ex- 
periments (13, 14, 16, 17). During the era 
of “scientific management,” quality of super- 
vision was generally regarded as important 
only in relation to policing activity, since the 
primary determinants of productivity were 
considered to be tools, methods, and pay. 

As noted by the Hawthorne experimenters 
in reference i» uncontrolled variables appar- 
ently responsible for the phenomenal increases 
in productivity: “The most important of these 
inadvertently introduced changes was the new 
method of supervision” (29, p. 179). At this 
time, these findings are less than surprising 
in view of the degree to which the worker is 
dependent upon his supervisor for the satis 
faction of important needs and the relation- 
ship between leadership and follower behavior 
(19, 20, 21). 

Since the approach to supervision found 
effective in these studies is based upon an un- 
derstanding of how to satisfy worker needs 
and orient toward organizational objectives, 
it follows that this understanding should be 
considered in predicting supervisory success. 

The best known effort toward developing 
an instrument to measure such insight is that 
of File and Remmers, whose How Supervise? 
(10) is purported to measure a supervisor's 
“knowledge and insight concerning human re- 
lations in industry” (see also 8, 9, 26, 31). 

How Supervise? consists of a list of state- 
ments of principles, practices and beliefs, some 
of which are to be indicated as desirable (D), 
undecided (?), or undesirable (U), and others 
as agreed (A), uncertain 
(DA). 


It appears likely that a person could con- 


(7), or disagree 


siderably improve his test score by increasing 
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Cali 


to industry in the Los 
teaching staff, University of 


' Now consultant 
area Member 
fornia, Extension 


his knowledge of acceptable do’s and don'ts 
without gaining the insight necessary to sig- 
nificantly improve his ability to supervise. 
Decker (5) found that ow Supervise? may 
measure “supervisory knowledge,” but that 
knowledge as measured by How Supervise? 
(or whatever it measures) is not a factor in 
achieving success as a supervisor. 

Measurement of supervisory knowledge in- 
stead of insight could also account for the re- 
sults of Karn (15), who significant 
gains in scores of a group completing a psy- 
chology course which stressed the principles 
of management of human relationships, and 
Wickert (33), who found that the gains in 
scores before and after training corresponded 
closely to the human relations content of the 
training. 

That How Supervise? may be measuring 
verbal intelligence appears to be indicated by 
the findings of Millard (24), who obtained 
correlations of .71 and .62 between How Su- 
pervise? and the Adaptability Test, Form A, 
for factory supervisors in a textile plant and 
supervisors on a metropolitan 


found 


newspaper. 
Both of these correlations were significant at 
the 1°7 level. 

After dividing a group of candidates for 
shop supervisory positions into upper and 
lower educational groups, Wickert (34) ob 
tained correlations of .20 for the upper group 
and .65 for the lower group between the Ad- 
vanced Short Form of the California Test of 
Mental Maturity and ow Supervise? 

How Supervise? may measure intelligence 
at the lower levels and supervisory knowledge 
at the upper levels. 


Method 


To develop a successful measure.of what may be 
called “supervisory insight,”’ it seems preferable to 
present typical supervisory problems similar to the 
way they would occur in a real-life situation, that is, 
with each problem structured to a degree that en 
sures necessary reliability, but with considerable lati 
tude for projection and with several conflicting pos 
sible choices for solution. Of these 


each course, 








406 


alternatives should be attractive enough so that each 
might be acceptable under certain conditions, but in 
the problem at hand the subject must select the one 
best answer after considering several conflicting fac- 
tors and not merely parrot what he has read or 
heard. 

Typical important problems encountered by indus- 
trial supervisors in their everyday human relations 
on the job were taken from case study material used 
by a large industrial organization for training super- 
visors. 

Cases were selected on the basis of the typicalness 
of the problem, its importance to effective super- 
vision, and the opportunities it offered for analysis 
in depth, 

From 10 to 15 questions with multiple (five) 
choice answers were written in an a priori manner 
for each of the 13 cases selected. The total 150 ques- 
tions were based on the author’s knowledge of re- 
search findings in the fields of psychology and per- 
sonnel management and his work experience. 

This initial form, entitled Supervisor's Problems, 
was administered to about 200 supervisors in several 
companies representing such industries as steel proc- 
essing, metal fabricating, oil refining, chemical pro- 
duction, railroading, and rubber production, 

One hundred of these were found useful for an in- 
ternal consistency analysis using the author’s key as 
criterion. Items were retained primarily on the ba- 
sis of difficulty level and correlation with total score 
(6, 18). Also influencing the selection was the as- 
sumption that having more cases and fewer questions 
per case would provide greater reliability by reduc- 
ing the possibility that a bizarre set on one case 
might unduly influence the total score. 

The resulting test, Supervisor’s Problems, Form 
AA, consists of eight problems and 50 items 


Results 


All of the results obtained with this 50- 
item form of the test are reported here. 

Management ratings were used as criteria 
in each of these studies. In the chemical 
plant five members of management identified 
the 12 “good” and the 11 “poor” supervisors 
in a meeting. The difference between means 
of these two groups was significant at the 1% 
level (¢ = 3.34). 

In the study conducted with oil company 
office supervisors (2), two superiors and the 
experimenter rated each supervisor as ‘‘suc- 
cessful” or “less successful.” The difference 
between means was significant at the 5°% level 
(¢ = 2.87). 

In the missile factory study, supervisors 
were rated as “most effective in handling peo- 
ple” and “least effective in handling people” 
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nn 
lable 1 
Supervisor’s Problems Scores of Supervisors in 
Three Companies 


Oil 
( ‘ompany 
Othce 


Missile 
Factory 


Chemical 
Plant 


N 40 10 25 

Range 14-42 17-33 22-39 
Mean 33.75 25.50 29.89 
Sigma 6.33 5.46 4.29 


Correlations with 


ratings Toia® = 53 r=.62 Tois® = .74 


Note.—-Total population of supervisors in the chemical plant, 
60; office supervisors in the oil company, 40; factory super- 
visors in the missile organization, 41. 

* Biserial r from widespread classes. 


in each of four departments by the Works 
Manager. ‘The difference between means of 
each four is significant (¢ = 5.00). 

In the chemical plant study, Supervisor’s 
Problems scores were found to correlate with 
the Otis r= 33, N= 25. In the. missile 
study, the correlation between Supervisor’s 
Problems and the Language Subtest of the 
Advanced Short Form of the California Test 
of Mental Maturity was found to be r = .39, 
NW = 25S. 

Supervisor’s Problems scores and F-Scale 
scores correlated r= — 49, N= 40. Edu- 
cation was found to be related to Supervisor’s 
Problems r = .49, for the same group. 

The Rulon (30) split-half reliability of Su- 
pervisor’s Problems was found to be .81, N 

40. 

Discussion 


It may be noted that the relationship be- 
tween Supervisor’s Problems scores and man- 
agement ratings is found to be substantial in 
each of these studies with the highest corre- 
lation existing with ratings based on ability 


to “handle people.” Since insight must be 
inferred from overt behavior, this latter cri- 
terion comes closest to describing what this 
test is designed to measure, and probably ex- 
cludes some of the variance associated with 
technical knowledge or clerical ability, which 
may have influenced the “good” and “poor,” 
“successful” or “less successful” ratings used 
as criterion measures in the other two studies. 

Although verbal intelligence is a common 
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factor influencing the ratings and Supervisor's 
Problems scores in these studies to some de- 
gree, the relationship between Supervisor's 
Problems scores and intelligence test scores 
makes it apparent that Supervisor’s Problems 
measures something else. That it measures 
what management was rating as essential to 
effective supervision is indicated by the magni- 
tude of the relationship between ratings and 
test scores. 

The relationship between F Scale (1) scores 
and Supervisor’s Problems scores suggests that 
supervisory insight as measured here is sub- 
stantially related to the democratic-authori- 
tarian dimension of personality with the more 
democratic supervisors possessing the greater 
insight. This relationship may be inferred 
from other studies (7, 13, 14, 16, 17, 22, 23, 
25, 29, 32). 

The above relationships also seem to indi- 
cate that Supervisor’s Problems measures a 
factor associated with education not meas- 
ured by tests of verbal intelligence. 

Several studies have demonstrated that edu- 
cation lessens prejudice (3, 4, 11, 28), but a 
more recent study by Kahn (12) seems to in- 
dicate that such a decrease is superficial and 
not due to a real personality change, but 
merely an ability to mask prejudice. 

If our educational system does in fact carry 
out one of its basic stated aims, that of teach- 
ing democratic values, one may expect an in- 
crease in supervisory insight as well as lower 
F-Scale scores with higher education 


Conclusions 


It seems reasonable to conclude that: 

1. The test, Supervisor’s Problems, differ- 
entiates significantly between supervisors high 
in understanding of how to effectively super- 
vise and those low in this understanding. 

2. This understanding, as measured by Su 
pervisor’s Problems, has a low relationship 
with verbal intelligence as measured by the 
Otis and the Language subtest of the Ad 
vanced Short Form of the California Test of 
Mental Maturity. 

3. This understanding is substantially re- 
lated to the democratic-authoritarian dimen 
sion of personality as measured bythe F Seale. 
Received March 22, 1957 
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Relative Pilot Aptitude and Success in Primary Pilot 
Training ' 


John D. Krumboltz 


Vichigan State University 


and Raymond E, Christal 


tir Force Personnel and Training Research Center 


If a cadet with medium aptitude for flying 
is placed in a group of high aptitude cadets, 
would he be more likely to fail than if he had 
been placed in a group of low aptitude cadets? 
The question may be put another way. Does 
a flying instructor have an absolute frame of 
reference in judging which cadets pass and 
which fail, or does he have a relative frame 
of reference so that his standard of what is 
acceptable varies with the quality of students 
he is instructing? It might be that an in- 
structor would fail the worst student in his 
group even though the worst student in his 
group might have been the best student in 
some other group had the groups been formed 
differently. This problem is especially acute 
in pilot training since each instructor usually 
has only four students. If the four students 
are randomly assigned to instructors, and if 
a relative frame of reference operates, then 
chance factors would contribute to the prob- 
ability of passing. One student grouped with 
highly talented fellow students might fail 
while another student of equal or even less 
ability might pass because he had happened 
to be placed with students of low ability. If 
such a condition prevails, the Air Force is not 
getting the best possible pilots, deserving men 
are failing, and the true validity of the pilot 
stanine is not being estimated accurately. 
The present study was designed to determine 
whether such a phenomenon exists. 

It seems reasonable to suppose that one’s 
frame of reference shifts in accordance with 


' This report is based on work done under ARDC 
Project No. 7719, Task No. 17009, in support of the 
research and development program of the Air Force 
Personnel and Training Research Center, Lackland 
Air Force Base, Texas. Permission is granted for re 
production, translation, publication, use, and disposal 
in whole and in part by or for the United States 
Government 


the quality of the material to be judged. 
Such a supposition has been confirmed by 
psychological research dealing with adaptation 
levels and frames of reference (2, pp. 377 
389). The generalized result of a number of 
studies is that individuals tend to form their 
standards of judgment from the nature of the 
objects to be judged. For example, if a cer- 
tain weight is judged to be heavy, it is be- 
cause that weight is heavier than some esti- 
mated average weight (adaptation level, in- 
difference point, or neutral point) which was 
determined on the basis of previous weights 
lifted. A weight of five pounds seems light 
to a man who has been lifting fifty pound 
weights but heavy to a man who has been 
lifting five ounce weights. 

The same sort of phenomenon has been 
found in a variety of situations. The beauty 
of a picture, the wickedness of a crime, the 
pleasantness of a color, and the loudness of 
a sound are just a few examples of how judg- 
ments are subject to the frame of reference 
of the observer (3, 4). 

There is already some evidence that  in- 
structors in primary pilot training do not 
possess an absolute frame of reference in 
judging the quality of student pilots. In one 
study (1) it was found that 70% of the stu- 
dents with no previous flying training passed 
when they were grouped with each other, but 
only 49° passed when they were in groups 
with students who had _ prior light 
training. 


plane 


Procedure 


The records of one primary pilot training 
school over a six-year period of time (Classes 
52F through 57H) utilized to 
cases for the present study. 


were obtain 
To achieve a 


relatively homogeneous sample, the only cases 


109 





410 


included in the study were aviation cadets in 
instructional groups of four. Any group con- 


taining a cadet held over to a later class* 


was excluded from the study. Instructional 
groups containing one or more student offi- 
cers (AF ROTC graduates) were not analyzed 
because of a possible instructor bias for or 
against student officers. Groups containing 
student officers only were not numerous 
enough to justify extensive study. A total of 
54 instructional groups containing 216 avia- 
tion cadets met the requirements for this 
study. 

The criterion of success consisted of the 
dichotomy of pass or fail in primary pilot 
training. All men eliminated from training 
and not held over to a later class were classi- 
fied in the fail category regardless of the 
stated reason for their elimination. 

Each student’s relative pilot aptitude (RPA) 
score was determined in the following manner. 
Students’ names and their pilot stanine scores 
were first arranged in accordance with the 
actual instructional grouping that had oc- 
curred during primary pilot training. Then 
for each man the mean pilot stanine of the 
other men in his group was calculated. Each 
man’s own pilot stanine minus the mean 
stanine of the other men in his group plus a 
constant of ten (to eliminate negative mem- 
bers) constituted his RPA score. <A_ high 
score, therefore, indicated that a man had 
relatively more aptitude than the average of 
the other men in his group. A low score in- 
dicated that he had relatively less aptitude 
than others in his group. 

Within each stanine level the men were 
ranked by RPA scores and divided according 
to whether they passed or failed in pilot train- 
ing. The distribution of RPA scores was split 
approximately at the median for each stanine 
level. ‘Therefore, the men of each stanine 
level were divided into four categories: high 
RPA and pass, high RPA and fail, low RPA 
and pass, low RPA and fail. 

It was hypothesized that within each stanine 
level cadets with high RPA scores would be 
more likely to pass than cadets with low RPA 
scores. Furthermore, it was hypothesized that 
this would be more true in the middle range 
of talent than at the extremes. For purposes 
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of statistical analysis, these hypotheses were 
translated to statistical null hypotheses as in- 
dicated below. 

The primary null hypothesis was that within 
each stanine level the proportion of passing 
cadets in the high RPA group is the same as 
the proportion of passing cadets in the low 
RPA group. 

The second null hypothesis was that there 
is no difference in the proportion of passing 
cadets among stanines. This hypothesis, as 
stated, was not basic to the study but was 
included to provide for a test of an interac- 
tion effect—the third hypothesis. 

The third hypothesis was that there is no 
interaction between stanine level and RPA 
scores. That is, the proportion of passing 
cadets in the high and low RPA categories is 
the same for each stanine level. 

Wilson (5) has described a method for 
computing tests of analysis of variance hy- 
potheses with nonparametric data. This tech- 
nique was utilized for testing each of the 
above hypotheses. The .05 level of confi- 
dence was chosen ‘for rejection of the null 
hypotheses. 


Results 


The number of cases falling in each RPA 
and pass-fail category by stanine is presented 
in Table 1. The test of each hypothesis is 
reported in Table 2. 

Data reported in Table 2 led to the rejec- 
tion of the first hypothesis. The relative pilot 
aptitude of a man within his instructional 
group was found to be significantly related to 
his chances for success in primary pilot train- 
ing. In general, a cadet had a better chance 
of success if he was grouped with cadets of 
relatively lower aptitude than himself rather 
than with cadets of relatively higher aptitude. 

The second hypothesis was accepted since 
the proportion of cadets passing in each 
stanine did not differ sufficiently to reach the 
required significance level. It should be 
noted that the test of this second hypothe- 
sis was not a sensitive one. It failed to 
take into account the linear trend for higher 
stanine levels to be associated with greater 
proportions of passing cadets. The validity of 
the pilot stanine itself is revealed by the bi- 
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Table 1 


Number of Aviation Cadets Falling in Each RPA and Pass-Fail Category by Stanine 


Stanine 


RPA 

Score , J P 
High ‘ 13 
Low : 11 


Total 24 


Note. 


low frequencies involved, 


*P = pass. 
oF fail. 


serial correlation of the stanine with the pass- 
fail criterion which is reported in Table 3. 
The second hypothesis was included in the 
analysis to isolate the source of variation due 
to stanine level and to provide for a test of 
the interaction between RPA categories and 
stanine levels. 

No significant interaction was observed. It 
was originally hypothesized that a cadet’s 
relative pilot aptitude would have more ef- 
fect on cadets in the middle range of aptitude 
scores than on cadets toward either extreme. 
A tendency in this direction can be noted. 
Inspection of Table 1 reveals that the effect 
of RPA standing on success is more pro- 
nounced in stanines 4, 5, 6, and 7 than it is 
in stanines 3, 8, and 9. In fact, there is a 
slight reversal in direction in stanine 8. How- 
ever, such differences in the effect of RPA 
standing at different stanine levels were too 
slight to produce a significant interaction 
effect. 

The biserial validity of the RPA scores 
against the pass-fail criterion was .412, while 
the validity of the pilot stanine was only .348. 


Table 2 


Summary of Tests of Hypotheses 


No Ditlerence Chi 


Due to Square 


5.443 
11.408 
2.199 


RPA category 
Stanine level 


Interaction 


14 ; OS 


33 : 5 . 143 


Three cadets with stanine 2 scores and two cadets with stanine 1 scores are not included in this table because « 
All five failed in primary pilot training. 


It is obvious that the validity of the pilot 
stanine was attenuated by the instructors’ 
relative frames of reference which introduced 
irrelevant variance into the criterion. With- 
out this attenuation, the validity of the pilot 
stanine would be identical to the validity of 
the RPA scores. This is true since the RPA 
scores are in reality nothing more than pilot 
stanines, adjusted for 
means. 


differences in group 

The RPA score can be analyzed in another 
manner by breaking it into its two com- 
ponents: (a) the pilot stanine score, and (b) 
the mean pilot stanine of the other three men 
in the group. Table 3 reports the inter-cor- 
relation of these two components along with 
their validity for the pilot training pass-fail 
criterion, Although the second component 
has a validity of only .134, it raises the 
validity. of the pilot stanine from .348 to .414 


Table 3 


Intercorrelations of RPA Components with Pass-Fail 


in Primary Pilot Training 


Variable 


1. Pilot 
2. Mean of other 
three men 


Pass-Fail 


tanine 


* Biserial correlations 


Pass 1; Fail 
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when the two components are combined in a 
multiple correlation formula. It may be ob- 
served that this multiple correlation is identi- 
cal to the validity of the RPA score itself. 

There is some evidence of homogeneous 
grouping by aptitude in this sample. A sim- 
ple analysis of variance revealed that the 
mean pilot stanines of the instructional groups 
varied more than might be expected if there 
had been no homogeneous grouping. The F- 
value was 1.79 which with 53 and 162 degrees 
Of freedom was significant at the .O1 level. 
Inspection of the groupings revealed that 
most of the groups were arranged alphabeti- 
cally but that a few groups did have consid- 
erable restriction in variability. To the ex- 
tent that homogeneous grouping did occur, 
one would expect attenuation in the validity 
of RPA scores. If no homogeneous grouping 
had occurred in any of the groups, the effect 
of RPA standing‘ on success in pilot training 
would have been even more pronounced than 
it was found to be here. 


Implications 


Results of previous research studies have 
been confirmed in a practical situation. An 
instructor does not have a constant frame of 
reference for evaluating the performance of 
pilot trainees, just as subjects in other ex- 
periments lack a constant frame of reference 
for judging the beauty of a picture or the 
magnitude of a weight. To the extent that 
these results are generalizable to other train- 
ing situations in civilian as well as military 
life, certain implications are apparent. 

First, the true validities of aptitude tests 
are being underestimated when this phenome- 
non operates. The criterion is contaminated 
by irrelevant variance which is unrelated to 
the predictors. In this situation, a within- 
groups correlation coefficient would provide a 
better estimate of the validity of a predictor 
test since this statistic is unaffected by dif- 
ferences in group means. 

Secondly, many students are being graded 
unfairly. Some students are given low grades 


or are eliminated from pilot training not be- 
cause their performance is below some abso- 
lute standard, but rather because it is below 
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the average of the particular group in which 
they happen to find themselves. When this 
happens, the nation does not get optimal uti- 
lization of the available qualified manpower. 
In the Air Force, the likelihood of grading 
bias can be reduced by assigning cadets to 
instructional groups in such a way that each 
group would have the same mean on the pilot 
stanine. When the means of all groups have 
been made equal, then the pilot stanine would 
correlate perfectly with RPA scores. In ci- 
vilian institutions it might be possible to as- 
sign students to sections of an undergraduate 
course in such a manner as to equate section 
means on some related aptitude score. When 
this is not desirable or convenient, an alterna- 
tive might be to inform each ‘instructor of the 
general level of aptitude in his section. He 
could use this information to guide his evalua- 
tions. In certain cases it might be possible. 
to set up objective tests for measuring pro- 
ficiency. In other instances, as in the pilot 
training program for example, the instructors 
could be furnished a standardized set of case 
studies which contain objective and observ- 
able characteristics of persons judged to be 
making satisfactory or unsatisfactory progress. 
A related problem has to do with the changes 
in grading practices over time. Grades and 
attrition rates in any civilian or military train- 
ing program are not likely to be sensitive to 
fluctuations in the level of incoming talent. 
The stantlard for satisfactory performance 
tends to slide up and down with the ability 
of the group. Any method for standardizing 
grading practices would help alleviate this 
problem. Where objective and valid selec- 
tion tests are employed, administrative action 
could be taken to vary grades and the attri- 
tion rate inversely with the level of talent 
selected. 
Summary 


Does a flying instructor have an absolute 
frame of reference in judging which cadets 
pass and which fail, or does he have a rela- 
tive frame of reference so that his standard 
of what is acceptable varies with the quality 
of students he is currently instructing? Based 
on a sample of 216 aviation cadets sampled 
from one primary training base over a six- 
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year period of time, the analysis revealed that 
a cadet has a better chance of success if he is 
grouped with cadets of relatively lower apti- 
tude than himself rather than with cadets of 
relatively higher aptitude. Thus, instructors 
in this study tended to have a relative frame 
of reference. To the extent that this phe- 
nomenon operates in other training situations, 
the nation is denied the services of the most 
highly qualified trained personnel, and the 
true validity of aptitude tests is underesti- 
mated. Methods of minimizing these dangers 
are discussed. 
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Depth Perception in a Stereoscopic Display as a Function of 
Number of Stimuli, Depth Range, and Number of 
Scale Markers * 


John E. 
Fordham 


The purpose of this study was to determine 
the degree to which depth in a stereoscopic 
presentation could be discriminated as a func- 
tion of several of the variables which may 
affect it. More specifically, the study was de- 
signed to determine the effectiveness of pre- 
senting three-coordinate data stereoscopically 
for use in an operational situation, viz., for 
the positioning of targets on a radar display. 

The position of aircraft is usually deter- 
mined by its direction in degrees and its dis- 
tance in miles from some reference point as 
well as by its altitude above the ground. Its 
direction (bearing) and distance (range) can 
be obtained with little difficulty from a 
conventional Plan Position Indicator (PPI) 
scope. The major source of difficulty is the 
altitude dimension. 

The problem of presenting three-coordinate 


data has been solved, to some extent, by the 
use of various coding systems such as varia- 
tion in target size (1), shape (9), brightness 
(6), and color (4, 5) and by the use of spe- 


cial types of equipment (3). Another an- 
swer to the problem might be found in the 
use of a stereoscopic device to display the 
three dimensions required (7). However, the 
question of its effectiveness for the human op- 
erator still remains. In the present status of 
psychological research, little is known about 
the ability of observers to use stereoscopic 
displays and the number of stereo-depth steps 
that observers can learn to identify is un- 
known. Consequently, this study was un- 
dertaken to provide data in a field that has 
not been adequately developed for practical 
application. 

1 This paper is based on a dissertation submitted 
to the Graduate School of Fordham University in 
partial fulfillment of the requirements for the degree 
of Doctor of Philosophy. The writer is indebted to 
Joseph F. Kubis and Richard T. Zegers, S.J., for 
their generous guidance. 

2 Now at the U. S. Naval Training Device Center, 
Office of Naval Research, 


Murray * 
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Two types of tasks were considered worthy 
of investigation, namely, one in which the op- 
erator was required to rank targets on the 
basis of the altitude dimension and a second 
task in which the operator was required to 
ascertain the precise altitude dimension of 
specified targets. Thus, the study was di- 
vided into two phases. Phase I was designed 
to determine the effect of stimulus density and 
the range of depth on performance of a depth 
perception task. In this study, stimulus den- 
sity was defined as the number of targets 
(dots) presented at one time on a simulated 
scope; the depth range was the difference in 
inches between the display and the furthest 
dot in space. Phase II was designed to evalu- 
ate performance on a depth perception task 
where more refined measures were required. 
Thus, for this phase, the effect of the num- 
ber of reference markers required to specify 
depth-levels was studied in addition to the 
two variables considered in Phase I. It also 
seemed worthwhile to determine the consist- 
ency of performance on the depth percep- 
tion task under the experimental conditions 
studied. 

Phase I 

Method. To provide a depth perception task for 
Phase I of this study, photographic slides contain- 
ing two disparate pictures of real dots in space 
constructed with Stereo-Realist three-di- 
mensional camera. The number of dots on each 
slide was either 10, 20, 30, or 40 and the depth range 
in which they were located was either 5, 10, or 15 
inches. Each dot was placed in space at one of 50 
possible stereo-depth levels. Consequently, when the 
depth range was 5 inches, adjacent depth-levels were 
O.1 inch apart; ranges 


were a 


similarly, when the depth 
were 19 or 15 inches, adjacent depth-levels were 0.2 
or 0.3 inch apart, respectively. 

During the testing session, each slide was presented 
through a stereoscopic projector as a circle of 10- 
inch diameter on a screen 36 inches in front of the 
S. The room was darkened with the only lighting 
furnished by the rays of the projector. 

The Ss for this phase of the study were 20 males 
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between the ages of 24 and 35 years. Ss were re- 
quired to meet the qualifications of normal vision 
and depth perception which were determined from 
test scores on the Bausch and Lomb Ortho-Rater 
(2). To insure greater homogeneity of the group 
with regard to visual skill, only those Ss meeting 
specified minimum requirements were permitted into 
the depth perception testing situation. 

With 4 variations in the number of dots and 3 
variations in the depth range, 12 combinations of 
these variables were obtained. However, to increase 
the reliability of the measures taken, three forms of 
each combination were presented to each S. Conse- 
quently, 36 measures from each S were obtained 
On each slide, S was required to rank the 10 dots 
closest in space to him. 

For each combination of depth range and number 
of dots, three slides were presented. The combina- 
tions were presented in a random order to each S. 
The average value obtained for each combination of 
the variables was taken as the measure of perform- 
ance for each individual 


Results. Scores were analyzed in terms of 
the time required to discriminate the depth 
of 10 dots and the number of errors made. 
Errors were analyzed by weighting the incor- 
rect responses in proportion to their opera- 
tional importance. Thus, if two dots at ad- 
jacent depth-levels were interchanged, one 
error was scored; if two dots at more than 
one depth-level apart were interchanged, two 
errors were scored; if a dot were substituted 
or omitted from the series of the 10 closest 
dots, three errors were scored. In addition, 
the frequency with which each type of error 
occurred was obtained without the application 
of any weighting system. 

The mean weighted error scores for the 
discrimination of the depth of 10 dots un- 


Table 1 


Mean Weighted Error Scores Obtained as a Function 
of Number of Dots Presented and 
the Depth Range 


Dx pth R 
Number 
of Dots Pe 10” 


Average 
Error 


10 : 0.2 0.5 
20 4.2 4.2 
30 7 3.8 44 
40 £0 ; 74 


Average 40 4.1 


Table 2 


Frequency of Types of Error Scores as a Function of the 
Number of Dots Presented and Depth Range 


Number of Dots 
Depth 


Range Type of Error 2 3 40 


Total 
5” Transposing one depth level “4 64 60 
Transposing more than one level 75 2 108 
Substitution 10 
Omission S51 
Total 
Transposing one depth level 
Transposing more than one level 
Substitution 
Omission 
Total 
Transposing one depth level 
Transposing more than one level 
Substitution 
Omission 


Total 


Grand total 


der varying conditions of depth range and 
the number of dots presented are shown in 
Table 1. It becomes obvious from the table 
that, as the depth range is increased, the 
errors in depth discrimination are decreased; 
as the number of dots presented is increased, 
the errors of depth discrimination are gener- 
ally increased. 

When each error was considered of equal 
value and not weighted according to any ex- 
ternal operational criteria, the frequency of 
each type of error shown in Table 2 was ob- 
tained. The results of this analysis are the 
same as those obtained when the error scores 
were weighted, i.e., performance improved as 


Table 3 


Mean Time Scores (in Hundredths of a Minute 
Depth Perception Task Under the Experi 
mental Conditions Specified 


ona 


Depth Range 
Number 
of Dots P 10” 


\verage 
lime 


10 19.1 
20 26.6 
x) 5 0 
40 44.3 


Average 





416 


the depth range increased and the number of 
dots presented at one time decreased. 

The mean time required for the discrimi- 
nation of the depth of 10 dots under varying 
experimental conditions is shown in Table 3. 
These results are generally consistent with 
those obtained with the error scores. 

Both the time and error scores were sub- 
jected to analyses of variance. In all cases, 
the analyses indicated that all the experi- 
mental variables produced a sigpificant  in- 
fluence on both the time and error scores. 


Phase IT 


Method. In this phase of the study, photographic 
slides were also employed. Since accurate identifi 
cation of the depth-level of each dot presented was 
required, reference markers were included. These 
markers were small rings on either side of the 10-inch 
circular display and one, two or five markers were 
tested in different trials. With these three variations 
in the number of seale markers, three ranges of depth 
and four variations in the number of dots, it was 
possible to construct 36 slides including all the com 
binations of these variables. An alternate form con 
sisting of assignment of the depth-levels to the dots 
in a reverse order to their original presentation was 
also constructed combination of the 
ables 


for each vari 

In all, 72 individuals served as Ss, two being tested 
on each of the 46 slides. The Ss were 72 males be- 
tween the ages of 20 and 35 years selected on the 
basis of Ortho-Rater described in Phase I 
Each S was tested on a randomly assigned slide; 
tested again on an alternate form of that slide; and 
retested on the slide initially presented to determine 
the consistency of the results obtained 


scores 


Results. In this phase, performance meas- 
ures were obtained from the time (in hun- 
dredths of a minute) required to specify the 
depth of 10 stimuli consecutively numbered 
and from error scores expressed as the differ- 
ence between the actual depth of the dots and 
that estimated by S. To equate the number 
of items for each of the experimental condi- 
tions, only the first 10 responses were consid- 
ered in the analysis of the results. The re- 
sults obtained among all the experimental 
conditions were subjected analyses of 
variance, 


to 


The mean time and error scores obtained 
as a function of the experimental variables 
studied in the initial testing are shown in 
Table 4. Since similar trends were indicated 
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Table 4 


Initial Testing Mean Time (in Hundredths of a 
Minute) and Error Scores as a Function 
of the Variables Specified 


Time 
Scores 


Error 
Variable Scores 
Kange 
Ss’ 
10” 


15” 
Number of dots 
10 
20 
30 
Ww 


Number of markers 


1 
2 


5 


Grand mean 


in the mean time and error scores obtained 
when using an alternate form and the retest 
slide, these data are not presented here but, 


with the variance analyses, can be found in 
the original study (7). However, slight varia- 
tions in the significance of the effects of the 
variables on these time and error scores were 
indicated. 

The following results of the experiments 
seem most worthy of note: 

1. The number of dots presented at one 
time did not affect the accuracy with which 
the depth perception task was performed; on 
the other hand, the number of dots did affect 
the time scores on the initial and retest pres- 
entations of the test stimuli. 

2. The depth ranze affected the error scores 
in every testing and had no effect on the 
time scores. 

3. The number of markers significantly in- 
fluenced the error scores in every testing; the 
number of markers influenced the time scores 
only in the initial and retest presentations of 
the test stimuli. 

4. The results obtained from the initial and 
retest presentations indicate a high degree of 
consistency. 





Depth Perception in a Stereoscopic Display 


Discussion 


Number of dots. In Phase I, the number 
of dots presented significantly influenced the 
accuracy with which the depth perception 
task could be performed; on the other hand, 
in Phase II, the number of dots presented at 
one time did not affect the accuracy of per- 
formance. One possible eXplanation of the 
difference may be the nature of the tasks per- 
formed in each phase. In Phase 1, the S was 
required to name the 10 dots closest in space 
to him. This task required S to scan the en- 
tire picture and-to consider every dot regard- 
less of the number of dots presented. Thus, 
as the number of dots increased, the prob- 
ability of omitting, substituting or transpos 
ing dots was also increased, and therefore, 
the number of errors. In Phase II, the S 
was required to specify the depth of one dot 
at a time. In this task, it seems plausible to 
assume that Ss mentally excluded from con 
sideration all other dots but the one under 
observation, and consequently, the additional 
dots did not influence the accuracy of per- 
formance to any significant degree. 

Depth range. Of the experimental vari- 
ables investigated in this study, it is inter- 
esting to note that depth range produces the 
most consistent effects upon the accuracy of 
depth discrimination. In both phases of this 
study, accuracy of performance on the depth 
perception tasks was significantly improved 
as the depth range was increased. However, 
this becomes quite understandable upon closer 
examination of the task performed under each 
of the experimental conditions. For each 
combination of the variables, assignment of a 
dot to one of 50 possible depth-levels was re 
quired. As the depth range was decreased, 
adjacent depth-levels came closer together 
with only 0.1 inch between them at the small 


est depth range of 5 inches. Consequently, 


the task of discriminating depth-levels be- 


came more difficult and error scores increased. 

Similar results were obtained for the time 
required to rank the 10 dots in Phase I. On 
Phase II, 


were placed numerically next to one another 


the other hand, in since the dots 


and were considered individually, the S’s re 


sponses became almost automatic, and no 
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differences in the time scores resulted from 
changes in the depth range. 

Number of markers. Several interesting 
problems have arisen in determining the ef- 
fects of the scale markers on performance of 
a task requiring an accurate identification of 
the depth-level of a dot in space. One ques- 
tion which remains to be answered is to what 
degree is depth discrimination affected by 
the distance between a dot and the nearest 
marker. ‘This was suggested by the incon- 
sistency in the time required to complete the 
task on the initial slide and the alternate 
form. It is quite possible that the differences 
in the results obtained on the “alternate” 
forms might be partially accounted for by 
the variation in the positioning of the dots 
presented. 

Another important feature to be considered 
in evaluating the effects of the number of 
markers on depth discrimination is the num 
ber value assigned to the marker. 
Throughout the test administrations con- 
ducted in Phase II of this study, error scores 
consistently decreased as the number of 
markers available for estimating depth-levels 
of the dots was increased. However, when 
time scores were analyzed, performance was 
only significantly affected in the initial and 
final experiments. In both instances, a lack 
of orderly progression of the time scores was 
evident and seemed to be related to a rela- 
tively long time required to complete the task 
when only two markers were available. These 
markers were located at the 20- and 50-unit 
levels and consequently, did not divide the 
near and far portions of the depth scale into 
an equal number of units. Thus, it seems 
that a longer time was required for the Ss to 
divide the scale into units mentally, but ac- 
curacy of performance was not similarly af- 
fected. Evidently, this problem requires fur- 
ther study to determine the best arrangement 
of the markers for most efficient performance. 

One surprising result of reducing the num- 


scale 


ber of markers was that, in spite of lessened 
assistance to the S, the differences in perform- 
ance remained relatively small when consid 
ered from a practical viewpoint. The widest 
range of mean time scores in any one experi- 
ment, for example, was from 79.0 to 100.2 
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hundredths of a minute as a function of varia- 
tion in the number of markers (Table 4); in 
the same experiment, mean error scores varied 
from 14.5 to 31.4 for 10 dots. In operational 
terms, the individual target time (in seconds) 
and error are the important considerations. 
Thus, from the data obtained in this study, 
the widest range of time scores was from 4.7 
to 6.0 seconds per target; the widest range of 
error scores was from 1.4 to 3.1 units. 

As a result, there seems to be very little 
practical significance in the difference of the 
two seconds required to specify the depth of 
one target as a function of varying the num- 
ber of markers, and consequently, one marker 
can be substituted for five in certain opera- 
tional situations. It is interesting to note 
that, when accuracy of performance is con- 
sidered, even the maximum error of three 
units out of 50 possible depth-levels, or six 
per cent, is practically negligible. A stereo- 
scopic display with this accuracy should be 
worthy of consideration in any presentation 
of three-coordinate data. 


Summary 


The effects of number of stimuli presented 
at one time, depth range, and the number of 
scale markers presented on depth discrimi- 
nation in a stereoscopic presentation were 
determined. Ss were tested on two tasks, 
namely, ranking dots in order of depth and 
specifying the actual depth of dots in space. 
Based upon analyses of variance of both 
time and error scores, the following conclu- 
sions were reached: 

1. When ranking of depth-levels was _re- 
quired, as the number of dots presented at 
one time was increased, accuracy was signifi- 
cantly decreased; when assignment of spe- 
cific depths was required, the number of dots 
did not affect accuracy. 

2. As the number of dots was increased, the 
time required to complete the task was sig- 
nificantly increased. 


Murray 


3. As the depth range increased, accuracy 
of performance was significantly improved re- 
gardless of the task performed. 

4. As depth range increased, time scores 
decreased significantly when ranking depth- 
levels was required; depth range had no effect 
on time scores when specific assignment of 
depth-levels was required. 

5. As the number of scale markers de- 
creased, both error and time scores were sig- 
nificantly increased. 
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